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PREFACE TO VOLUME II 


This volume falls into five sections. The first, comprising chapters 17 to 20,^4eal8 
with Estimation. The second, comprising chapters 21, 23, 24 and 26 to 287 covers the 
Theory of Statistical Tests, including the Analysis of Variance and Multivariate Analysis. 
The third, consisting of chapter 22, deals with Regression Analysis and completes the 
account of statistical relationship begun in chapters 13 to 16 of Volume I. In the fourth, 
chapter 25, I have tried to give an introductory account of the reaction of theoretical 
considerations on the Design of Statistical Inquiries. Finally, the fifth, comprising chapters 
29 and 30, deals with the Analysis of Time-Series. 

The literature of statistical theory is now so vast that it seemed worth while devoting 
considerable space to a bibliography, which is given in Appendix B. Although it is far 
from complete, I ho|)e that it will serve its purpose in guiding the student to the main 
sources. i 

The chief problem in the writing of this volume arose in connection with the logic of 
statistical inference. Whenever possible I have kept the treatment objective. It is, 
I consider, unfair in a book of this kind not to present all sides of a case, particularly when 
there is so much disagreement among the authorities. Some day I hope to show that 
this disagreement is more apparent than real, and that all the existing theories of inference 
in probability differ essentially only in matters of taste in the choice of postulates. But 
this boqk is not the place for such work, and for the present I am content to state the 
position and to leave the reader to exercise his own choice. 

The difficulty became most acute in dealing with confidence intervals and fiducial 
inference, where two approaches which at first sight appear identical can lead to different 
results. Rather than try to reconcile them I have written a separate chapter on each. 
Professor E. S. Pearson was kind enough to read the manuscript of chapter 19 and Professor 
R. A. Fisher that of chapter 20, so that I think their respective views are, at any rate, not 
misrepresented. I am very grateful to them both for their help in this connection. 

My thanks are also due to Mr. P. A. Moran and Mr. A. J. H. Morrell, who cheerfully 
undertook to help with the proof reading and to whose painstaking scrutiny I owe the 
removal of a number of obscurities and errors. I shall be grateful to any reader who 
detects and notifies me of any further slips which have evaded us. Once again I have also 
to thank the publishers and the printers for the trouble they have taken in the production 
of the finished work. 

M. G. K. 

London, 

Apnh 1946. 
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CHAPTER 17 

ESTIMATION: LIKELIHOOD 


The Problem 

17.1. On several occasions in previous chapters we have encountered the problem 
of estimating from a sample the values of the parameters of the parent population. We 
have hitherto dealt on somewhat intuitive lines with such questions as arose — ^for example, 
in the theory of large samples wo have taken the moans and moments of the sample to be 
satisfactory estimates of the corresponding means and moments in the parent. 

We now proceed to study this branch of the subject in more detail. In the earlier 
part of the present chapter wo shall examine the sort of criteria which are required of 
a “ good ” estimate and discuss the question whether there exist ‘‘ best ” estimates in 
any acceptable sense of the term. In the remainder of the chapter and in Chapter 18 
we shall consider various methods of obtaining estimates with the required properties. 
In Chapters 19 and 20 we shall look at the same problem from a rather different point of 
view and discuss the theories of confidence intervals and fiducial limits. 

17.2. It will be evident that if a sample is not random and nothing precise is known 
about the nature of the bias operating when it was chosen, very little can be inferred from 
it about the parent population. Certain conclusions of a trivial kind are sometimes pos- 
sible — for instance, if we take ten turnips from a pile of 100 and find that they weigh ten 
pounds altogether, the mean weight of turnips in the pile must be greater than one-tenth of 
a pound ; but such information is rarely of value, and estimation based on biassed samples 
remains very much a matter of individual opinion and cannot be reduced to exact and 
objective terms. ^We shall therefore confine our attention to random samples only. Our 
general problem, in its simplest terms, is then to estimate the value of a parameter in the 
I)ai:ent from the information given by the sample. In the first instance we consider 
the case when only one parameter is to be estimated. The case of several parameters 
will be discussed later. J 

17.3. Let us ill the first place consider what we mean by ‘‘ estimation We know , 
or assume as a working hypothesis, that the parent x>opulatiori is distributed in a 
which would be completely determinate if we knew the value of some jiarametcr .0. We 
are given a sample of values . . . x^. We require to determine, with the aid of the 
a;’s, a number which can be taken to be the value of 0, or a range of numbers which can 
be taken to include that value. 

Now a single sample, considered by itself, may be rather improbable, and any estimate 
based on it may therefore differ considerably from the true value of 0. It appears, 
therefore, that we cannot expect to find any method of estimation which can be guaran- 
teed to give us a close estimate of 0 on every occasion and for every sample. We must 
content ourselves with formulating a rule which will give good results “ in the long run ” 
or “ on the average or which has “ a high probability of success ” — ^phrases which 
express the fundamental fact that we have to regard our method of estimation as generating 
a population of estimates and to assess its merits according to the properties of this 
population. 

A.s. — 1 
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17.4. It will clarify our ideas considerably if we draw a distinction between the 
method or rule of estimation, which, following Pitman, we shall call an Estimator, and the 
value to which it gives rise in particular cases, the Estimate. The distinction is the same 
as that between a function / (a;), regarded as defined for a range of the variable x, and the 
particular value which the function assumes, say f (a), for a specified value of x equal to a: 
Our problem is not to find estimates, but to find Estimators. We do not reject a method 
because it gives a bad result in a particular case (in the sense that the estimate differs 
materially from the true value). We should only reject it if it gave bad results in the long 
run, that is to say, if the population of possible values of the estimator were seriously 
discrepant with the value of 0, The merit of the estimator is judged by the population 
of estimates to which it gives rise. It is itself a random variable and has a distribution/ 
to which we shall frequently have occasion to refer. 


17.5. In the theory of large samples we have often taken as an estimator of. a para- 
meter 0^ statistic t calculated from the sample in exactly the same way~as"0T[s calculated 
from the population, e.g. the sample-mean is taken as an estimate of the parent mean. 
Let us examine how this procedure can be justified. Consider the case when the parent 
population is 


dF = 






00 < 00 


. (17.1) 


Requiring an estimator for the parent mean 0, we take 


The distribution of t is 



dF = 


\/n 

vW) 


exp 



(t 


0)H dt, 


(17.2) 

(17.3) 


that is to say, t is distributed normally about 6 with variance l/n. We notice two things 
about this distribution : (a) it has a mean (and median and mode) at the true value 0, 
and (6) as n increases, the scatter of possible values of t about 0 becomes smaller, so that 
the probability that a given t differs by more than a fixed amount from 0 decreases. We 
may say that the accuracy of the estimator increases as n increases, or simply with n. 


17.6. Generally, it will be clear that the phrase “ accuracy increasing with n ** has 
a definite meaning whenever the sampling distribution of t has a variance which decreases 
with l/n and a central value which is either identical with 0 or differs from it by a quantity 
which also decreases with l/n. Many of the estimators with which we are commonly 
concerned are of this type, but there are exceptions. Consider, for example, the Cauchy 
population 

= - - — ^ , - 00 < » < 00 . . . ( 17 . 4 ) 

w 1 + (» — 0)* 

The mean (assuming that we conventionally agree that it exists) is at a; = 0. But if we 
try to estimate 0 by the mean-statistic t we have, for the distribution of t, 


dF =~ 
n 


dt 


— CO <t < CO 


(17.6) 


1 +(t-0)*’ 

(Cf. Example 10.1, vol. 1, pp. 233-4.) In this case the distribution of i is the same 
as that, of any single value of the sample, and does not increase in accuracy as n inoteaMS. 
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CmmaieTice 

17 . 7 . The property of possessing increasing accuracy is evidently a very desirable 
one ; and indeed, if the variance of the sampling distribution decreases with increasing 
n it is necessary that its central value should tend to 0, for otherwise the estimator would 
have values difiFering systematically from the true value and would be useless, not to say 
dangerous. We therefore formulate our first criterion for a suitable estimator as follows : — 

An estimator computed from a sample of n values, will be said to be a consistent 
estimator of 0 if, for any positive e and ?/, however small, there is some N such that the 
probability that 

1^. -0|<^ (17.6) 

is greater than 1 - for all n> N, Tn the notation of the theory of probability, 

{\tn-9\<e}>l -ri, n>N. . . . (17.7) 

The definition bears an obvious analogy to the definition of convergence in the mathe- 
matical sense. Given any fixed small quantity e we can find a large enough sample number 
such that for all samples over that size the probability that t differs from the true value 
by more than e is as near zero as we please, is said to converge in probability to 0. Thus 
f is a consistent estimate of 0 if it converges to 0 in probability. 

Example 17,1 

The samj)le mean is a consistent estimator of the parameter 0 in the population (17.1). 
This we have already established in general argument, but more formally the proof would 
proceed as follows : — 

Suppose we are given e. From (17.3) we see that (t — 0) ^n is distributed normally 
about zero with unit variance. Thus the probability that | (^ — 0) ^/n | < e y/n is the 
value of the normal integral between limits ±: ey/n. (Jiven any positive we can 
always take n large enough for this quantity to be greater than I ~ rj and it will continue 
to be so for any larger n, N may therefore be determined and the inequality (17.7) is 
satisfied. 

Example 17.2 

Suppose we have a statistic whose mean value differs from 0 by order whose 
variance is of order n^ ^ and w^hich tends to normality as n increases. Clearly 
(^n ““ ®)/V^n will then tend to zero in probability and will be consistent. This covers 
a great many statistics encountered in practice. 

Unbiassed Estimators 

17 . 8 . The property of consistence is a limiting property, that is to say, it concerns 
the behaviour of an estimator as the sample number tends to infinity. It requires nothing 
of the behaviour for finite n, and if there exists one consistent estimator we may construct 
infinitely many others ; e.g. 

n — a 
n^b ^ 

is also consistent. We have seen that in some circumstances a consistent estimator of the 
mean is the sample mean 

\ 

f - 27 Xy. 
n ^ 


( 17 . 8 ) 
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But so is 



(17.9) 


Why do we prefer one to the other ? Intuitively it seems absurd to divide the sum of 
n quantities by anything other than their number n. We shall see in a moment, however, 
that intuition is not a very reliable guide on such matters. There are reasons for preferring 


1 


n — 


1 


i-l 


(17.10) 


to 



. (17.11) 


as an estimator of the parent variance, notwithstanding that the latter is the sample 
variance. 


17 •9, Consider the sampling distribution of an estimator t If the estimator is 
consistent, its distribution must, for large samples, have a central value in the neighbour- 
hood of 0. We may choose among the field of consistent estimators by requiring that 
0 shall be equated to this central value not merely for large, but for all samples. Whether 
we choose as the appropriate central value the mean, the median or the mode is to some 
extent a matter of taste. We shall consider below what follows if we select the mode 
(which gives us the maximum likelihood estimators). For the present we discuss the mean. 

If we require that for all n the mean value of t shall be 0, we define what is known as 
an unbiassed estimator : 

E (t) =0 (17.12) 

This is an unfortunate word, like so many in statistics. There is nothing except con- 
venience to exalt the arithmetic mean above other measures of location as a criterion of 
bias. We might equally well have chosen the mode as determining the “ unbiassed 
estimator, in which case the mean estimator would be “ biassed ” whenever it gave a dif- 
ferent result. Since the use of “ unbiassed ’’ in connection with the mean is fairly wide- 
spread, however, we shall continue to use it.* 


Example 17.3 
Since 


\ VJ 

/ 




1^71 j n 


= -27^1 = /Ui, 




the mean-statistic is an unbiassed estimator of the parent mean whenever the latter exists. 
But the sample-variance is not an u nbiassed estimator of the paren t variance. We have 

E {Z{x- »)*} = E 

= (» - 1)jm; - (» - 

= (n — 

* The word has already occurred in vol. I, p. 200, in this sense. It may be spelt with either one 
or two My usage, I am afraid, is not consistent, but in this volume I use two. ' 


j 9^ k 
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^ 1 • 

Thus - Z (a: — xy has a mean value — - /i*. On the other hand, an unbiassed estimator 

n ^ ' n 

is given by 

n — 1 

and for this reason it is sometimes preferred to the sample variance. There are other 
reasons which wiU appear when we come to study the analysis of variance. 

Efficient Estimators 

17.10. In general there will exist more than one consistent estimator of a parameter, 
even if we confine ourselves only to unbiassed estimators. Consider once again the esti- 
mation of the mean of a normal population with known variance. The sample mean is 
consistent and unbiassed. We will now prove that the same is true of the median. 

Consideration of symmetry is enough to show that the median is an unbiassed estimate 
of the parent mean, which is, of course, the same as th(5 parent median. For large n the 
distribution of the median tends to the normal form (cf. Example 9.7, vol. 1, p. 213), 

dF oc exp { - 2nft {x - 0)^} dx . . , . (17.13) 

where is the median ordinate of the parent, in our present case 1/\/(2 :t) — 0*3989. The 
variance tends to zero and the estimator is consistent. Its variance is 7t/2n, 

17.11. We are therefore at liberty to seek for further criteria to choose between 
estimators with the common property of consistence. Such a criterion arises naturally 
if we consider the sampling variances of the estimators. Generally speaking, the estimator 
with the smaller variance will be grouped more closely round the value 0 ; this will certainly 
be so for distributions of the normal type. An estimator with a smaller variance will 
therefore deviate less, on the average, from the true value than one with a larger variance. 
Hence we may reasonably regard it as better or more efficient, V 

If, of two consistent estimators)^! and t^, we have var ti < var t^ forjjjU^, then ti is 
more efficient tnanJ^ for all sample sizes. It is possible to have var ti < var for sqme 
ranges of n and var > var for others, in which case the estimators are more or less 
efficient in different ranges, ^ > ? . ■ , ' " 

In the case of mean and median we have, for any n, jy • » - 

var (mean) — . . . . . (17.14) 

and for large n 

var (median) — — ..... (17.15) 
2n^ ’ 

■A. 

where o® is the parent variance. Since w/2 = r67 > 1 the mean is more efficient than \ 
the median for large n at least. For small n we have to work out the variance of the median, j 
The following values may be obtained from those given in Table XXIIl of Tables for 1 
Statisticians and Biometricians, Part II : — 

n 2 3 4 6 

var (median) 1*00 1-36 1-19 1-44 

It appears that the mean is always more efficient than the median in estimating the para- 
meter $ for the normal distribution (17.1). 
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idXampU 17.4 

For the Cauchy distribution 



3t 


dx 

+ (a: - 0)*’ 


~ 00 <X < 00 


we have already seen that the sample eonystei^t- tj^iiftimator. 

the median in large samples we have, since the median oiidihate is l/n, 

var (median) = — . 

4n 


However, for 


It is seen that the median is consistent, and although direct comparison with the mean 
is not possible because the latter does not possess a sampling variance, the^ me dian i s eyi- 
d^ntly a better estimator for 0 than the mean. This provides an interesting contrast with 
the case of the normal parent, particularly in view of the similarity of the parent frequency- 
distributions. 


17.12. In some cases, as we shall see below, there exist consistent estimators whose 
sampling variance for large samples is less than that of any other such estimator. We 
shall call su ch estimators most-e fficient. When they exist they provide a standard of 
measurement of efficiency. In fact, if has variance and the most-efficient estimator 
ti has variance Vi, the efficiency E of ti is defined as 

Vt 


(17.16) 


It will be seen later that in normal samples the mean is a most-efficient estimator, so that 
the efficiency of the median for such samples is 


2n 

71 


1= 0-637. 
n 


17.13. If we have a sample of 100 members the variance of the median (assuming 
normality) will be about the same as that of the mean in only 64 members. Thus, if 
sampling variance be accepted as a criterion of accuracy of estimation, the use of the median 
instead of the mean sacrifices about 36 observations in 10(). It is not possible to economise 
by using a different estimator than the mean. 

Other things being equal, the estimator with the greater efficiency is undoubtedly 
the one to use. But sometimes other, things are not equal. It may, and does, happen 
that a most-efficient estimate derived from ti is more troublesome to calculate than an 
alternative The extra labour involved in calculation may be greater than the saving 
in dealing with a smaller sample number, particularly if there are plenty of further 
observations to hand. 


Example 17,5 

Consider the estimation of the standard deviation of a normal population with variance 
and unknown mean. Two possible estimators are the standard deviation of the sample 
(or the square-root oi S(x — x)^/[n — 1) if it is desired to use an unbiassed estimator) 
and the mean deviation of the sample multiplied by y/{n/2) (cf. 5.20). The latter is 
easier to calculate, as a rule, and if we have plenty of observations (as, for example, if we 
are finding the standard deviation of a set of barometric records and the addition of fiitther 
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members to the sample is merely a matter of turning up more records) it may be wortli 
while estimating from the mean-deviation rather than from the standard deviation. 

In normal samples the variance of the mean-deviation is (9.13) — 

— V' . -|_ y'{w(n — 2)} — n + arc sin — ^ ( I — . (17.17) 

n \2 ^ w — 1/ n \ n) 

The variance of the estimator from the mean deviation is then approximately 


n 



(17.18) 


Now the variance of the standard deviation is (9.22) a^/2n, and we shall see later that it 
is a most-efficient estimator. Thus the efficiency of the first estimator is 


0 




0*876. 


The accuracy of the estimate from the mean deviation of a sample of 1000 is then about 
the same as that from the standard deviation of a sample of 876. If it is easier to calculate 
the m.d. of 1000 observations than the s.d. of 876 and there is no shortage of observations, 
it may be more convenient to use the former. 

It has to be remembered, nevertheless, that in adopting such a procedure we are 
deliberately wasting information. By taking greater pains we could improve the efficiency 
of our estimate from 0*876 to unity, or by about 14 per cent, of the former value. 


Sufficient Estimators 

17.14. The comparison of the efficiencies of two estimators, as measured by their 
variances, may be made for any w, but the absolute efficiency as defined in 17.12 by relation 
to a most-efficient estimator is in the main a limiting property. We shall see below (17.36) 
that the definition may be extended to small samples and to non-normal variation, but 
most-efficient estimators for finite n do not exist so frequently in statistical practice 
as in the limiting case of large samples. Sometimes, however, there are estimators which 
may be regarded as the “ best ” for samples of any size, and we proceed to consider 
them. - 

Before doing so, we prove that, in the limit, all most-efficient estinmtprs tend J;o 
equivalent. . 

More precisely, if two most-efficient estimators ti and t^ tend in the limit to be dis- 
tributed in the bivariate form 


dF oc exp 1^- {(«» - ay - 2p {h - 0) (<. - B) (t, ~ dt,, 

then the correlation p = 1. Here v is the variance of each estimator. 

Consider the estimator > 


^ Ui — i (^1 + ^ 2 )% - ‘ 

Clearly Ui is consistent since ti and tt are both so. Putting ^ ‘ 

^^2 = J (^1 — ^ 1 ) 

we have, for the joint distribution of and 

dF cc exp - 0)* + 2 (1 + p) m|}J duidn^. 


(17.19) 


(17.20) 
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^hus Ut is distributed independently of u, and 6 and we have 

i; (1 — p*) 1 + p 

2 • 
Now ti is a most-efficient estimator and hence 

1 +P 


, (17.21) 


V = var tti > var ti = v 


giving 


i+P 


> 1 . 


. (17.22) 


But p cannot be greater than unity and hence p = I, which proves the theorem. 


17 . 15 . Consider once again the estimation of 0 in the normal population (17.1). 
The joint distribution of the sample is given by 

dF = -^exp \ -hY (Xi - 0)4 dx, . . . dx„ . . (17.23) 

We have the familiar result 

n 

^ {x^ — BY — E (x — xY + n{x — 6)*, 

and hence 

dF = — ^ — exp i— ^ ““ 0)A exp {— | E {x — xY} dx^ . . . dx^ . (17.24) 

Thus the frequency function of the distribution of re’s (which is equivalent to the likelihood 
function) can be factorised into two parts, one depending on x and 6, the other depending 
m the x*a but not on 0. 

The quantity ^is then said to be a sufficient estimator of 6 ; and generally, if the 
likelihood function is expressible in the form (as a product of two frequency functions) — 

L {Xij . . • Xj^f 0) = Lx (^, 6) L% (iTi, . . . x^), • . (17.25) 

Inhere Lx does not contain the x'a otherwise than in the form t and L^ is independent of fl, 
I is said to be a sufficient estimator of B, 


17.16. As so defined, a sufficient estimator, if it exists at all, is unique except that 
if t obeys the relation (17.25) any function of t will obviously also obey the same relation. 
From all such functions we must evidently choose one which gives a cojisistent estimator 
and can sometimes, as in the example of the previous section, find the estimator which is 
unbiassed. Apart from such ambiguities, which offer no difficulties in practice, the property 
of uniqueness holds. For if tx and t^ were two different sufficient statistics, not functionally 
related, we should havp — 


and hence 


(^i> ®) L 2 (Xi, 


X^) ^ Mx (tty 0) Jfl (Xxy 




(^u ®) 


^t 

Lt 


(17.26) 


(i%i 0 ) 

Since the expression on the right does not contain 0, Lx must be a factor of Mx and more- 
over the quotient must be a constant ; for if it were a function of the x'a that function 
would Jiave been assimilated to Lt or if a- 
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Hence 

(^i> 0) ^ k Jill (^2> ®)> 

and this cannot be so unless ti and are functionally related. 


17.17. The fundamental property of sufficient estimators derives from the following 
theorem : — 

If ti is sufficient and is any other estimator of 0 (not a function of ^i) the joint dis- 
tribution of ti and ti may be put in the form 

dF = /i {tu0)f2 {tii ti) dtidtif .... (17.27) 
where /a does not contain 6. Conversely, if (17.27) holds for every tz then ti is sufficient. 

Before proving this result let us notice its importance. From (17.27) it follows that 
for any given ti the distribution of ti is equal to /a (ti, ti) dti, i.e. is independent of 0. Con- 
sequently, if we know ti, the probability of any range of values of ti is the same for all 0. 
The distribution of fj given ti, therefore, can throw no light whatever on 0. Thus, a know- 
ledge of gives all the information that the sample can supply about 0 and no other 
estimator can add anything to it. We are clearly justified in such circumstances in 
describing a sufficient estimator as the “ best 

Now as to the theorem itself. The direct part is easily proved. In fact, we have from 
(17.2.5)~ 


L (xi, . . . X,,, 0) dxi . . . dx„ ^ Li (tu 0) Li (a;,, . 


Make the transformation 

Vi ^ tx (^ 1 , . • . OCnT 
Vi -= ^2 (xi, . . . xj 
VZ = -^3 


The element of frequency becomes 

■^1 (^ I > -^2 (^ 1 » 


^Vl) 


d (Xi, Xi) 

d (tu ti) 


dyi 


. x,^) dxi . , . dx^. 


. (17.28) 


. . dy, . . (17.29) 


where the ^’s and a;’s are to be expressed in terms of the y's. We have excluded the case 
when ti is functionally related to ti, and hence the Jacobian d (Xi, Xi) /d(ti, tz) does not 
vanish identically. The frequency element of yi and yz is then obtained from (17.29) by 
integrating out the other variables. Since yi and yz are equal respectively to ti and tz 
this process will leave unchanged the function Li (ti, 0) and reduce the other part to a 
function of ti and tz, say fz (tj, ^ 2 )* Writing fi for Li we then have 

dF — /i (^1, 0)^2 (^i> ^ 2 ) dti, 

as stated in the theorem. 

The converse is a little more difficult. Let ti be sufficient and make the transformation 
Vx = hi Vz = ^ 2 , ©to. The joint distribution of sample values becomes 

L{xi, . . . *„) = (<„ y*, ■ • • yn)\^\ • • • (17.30) 

! dXi I 


Since ti is independent of 0, so is dti/dxi. Hence, if the distribution of ti is / (fi) dti, L' may 

. . . y„) (17.31) 

and the converse will be established if we can show that L" does not contain d'. This we 
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do by demonstrating that if there are values • • • y *n ^r which U assumes different 
values for different values of 0 then the joint distribution of ti and fa cannot be independent 
of 0, which contradicts our hypothesis. 

Suppose, then, that for two values of 0, say 0i and 0a, 

(^i> 2/2* • • • ~ (^i> y29 • * • • t/n)e, 4" 2a, . • (17.32) 

where a is not zero. Consider a new statistic fa defined by 

^ (y/ - »;)* . . . \ . (17.33) 

^-2 

Assuming that L'' is continuous in the y’s, we may determine a value of f#, say f^, such that 
L” . . . yje, > (^1 2f2, • • • yjo. + « • • (17.34) 

everywhere inside the range of values bounded by 

t,^^£(y-^yr. 

Then for any fixed fi the total frequency inside this range is obtained by integrating L*' 
over the appropriate values, and we shall find, in virtue of (17.34), 

A>/8.. (17.36) 

the /’s referring to total frequencies. 

But if the joint distribution of fj and fa is 


we have for the frequencies /, 




and hence 


dF (fi, fa)^ dfi dfa 

r/g 

foi ~ I ^ (^i> ^2)0, 

Jo 

/a, “ f A (^i> ^a)p, ^^2 
Jo 


f {h (fj, fa)(j^ — h (fi, fa)oj } dfa 
J 0 


so that the joint distribution cannot be independent of 0. 

The above demonstration relates to the case when the frequency functions are con- 
tinuous. In the discontinuous case the argument simplifies and we leave it to the reader 
to supply the proof. 


to supply tJ 

17 . 18 . 


We now prove an important further result to the e&ct that a sufficient 
estmator is most-efficient, provided that a most-efficient estimator exist s. We assume 
t^iat the joint distribution of the sufficient estimator fi and any other estimator fg tends 
to normality for large n, say in the form 


dF oc exp ["- . — ^ 

2(l-p*)\ 


0 ) (U - 0 ) 




I f I dtxdtx . (17.36) 

v'(»i*’«) 

where Vi and v, are the Tarianoes of tx and respectively. Since tx is sufficient, the dis- 
tribution of tx given tx does not contain B. Now the distribution of tx is 


dF oc exp J I dtx 


(17,37) 
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and hence that of given is 


,C exp r + <'■ - '’>■'1 + J 

■ 2(l-p‘)l ». V(».e.) ». J ’ 

J 

jp r 1 (< 2 ~^ 

dF oc exp -- „ „ „v S , - -- 

L 2(1— p^)\ Vvi \/«! 


•wliich reduces to 


(h - 0)*1 


dtt 


«i J 
. (17.38) w 


If this is not to involve 0 we must have 


p = /— — y/E, where E is the efficiency of <j. . (17.39) ^ 

\ t’2 


Since p < 1 it follows that Vi < Vj, i.e. has a smaller variance than any other estimator. 
Consequently, if there exists a most-efficient statistic, f, itself is most-efficient. 

17.19. The criterion of sufficiency is not a limiting property. A^ufficient estimator 
is best for any sample size since; it gives all the information about d that the sample can 
give y and it for lar^e samples. If we could always find a sufficient 

eStirriS'flir* orrr 'pto'blem "w^^ be^solved, but unfortunately sufficiency is the exception 

rather than the rule. * 

C'y,'-' ‘ ■ ■ > 

Example 17 £ i ' '. • y 

The frequency element of a sample of n from the population 




dF ^ L . exp 
a y/{27i) 




(a; 


(7* J 


dx 


can be put in the form 


dF = — - exp 
ay/yln) 


n {x m)2 


n-*l 


1 n 2 

I n-l 

(2(7*p r\ 


(-y) 


c dx ds'^ 


(Cf. Example 10.5, vol. I, p. 238.) 

If we know cr, then, as we have already .seen, x is sufficient for m. But if we know 
m, s is not sufficient for a. In fact, the factorisation in the above equation requires the 
appearance of a in the element relating to f , and we cannot separate a factor containing 
s and a alone or the remaining variables alone. 

This is what we might expect. If we know the real mean m there is little point in 
preferring the sample variance 

a* = i 2; (* - £•)* 


to the second moment 


n 


1 


n 


as an estimator of the parent variance. The distribution of s* is given by 

dF = 


n 

rvi 


(2«t*)2- rf^ 


ns'* 

e 2c« ds'^ 
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*and this embodies the whole of the frequency element of the sample, apart from differeintials 
in the other variables. Thus 3 ' is sufficient for a. 

17.20. This completes the first stage of our inquiry. The criteria of consistence,, 
efficic®^ and sufficiency provide standards which we shall look "for lii * * jgQod J * ^stimatoy s. 
Of themselves, however, they do not provide'ahy ‘systematic way of deriving estimators 
which obey them. We shall now consider various methods which have been proposed for 
providing estimators and examine how far they conform to our criteria. The most 
important method is that of maximum likelihood, which will occupy the remainder of this 
chapter. In the next chapter we shall consider four others, 

varianc^^the method of minimum the method of least squares, and the method of 

inverse probability. 

~ ' ■* • 


Maximum Likelihood 

17.21. If the frequency function of the parent population is f {x, 0), the likelihood 
function of a sample of n is, by definition, 


L=f{x„d)f(x„0) . . .f(x„0). . . . (17.40> 

The Principle of Maximum (or Maximal) Likelihood then states that if there exists a statistic 
t = t (xi, . . . x„) which maximises L for variations of 6, then t is to be taken as an 
estimator of 6. In short, t is the solution (if any) of 




30 * 


< 0 . 


. (17.41) 


Since L is positive, the first equation is equivalent to 


1 dL 
L 30 


-ilogi-O, 


. (17.42) 


a form which is frequently more convenient. 

There is one small point to notice here. In our usual convention, if a frequency 
function has a finite range, we regard it as defined from — oo to + cx) but as zero outside 
that range. In this chapter we shall occasionally meet the reciprocal of/, which is undefined 
for zero /. Unless the contrary is specified we shall suppose that where / is zero 1// is also 
’ to be regarded as zero. This will enable us to continue to regard the range as infinite, but 
some care is necessary where / is assumed everywhere continuous, for discontinuities may 
appear in / and 1/f at the terminals of the finite range. The point becomes important 
when we try to make certain existence theorems rigorous. 


17.22. In sections 7.27 to 7.31 we touched on the principle of maximum likelihood 

from the point of view of statistical logic. We pointed out that its adoption required a 
new postulate in the theory of inference, but referred to the fact that the principle was 
recommended by the statistical properties of the estimators to which it leads. We now 
proceed to prove a series of theorems about these estimators, from which it will be seen 
that the posterior recommendation, so to speak, is very strong. In fact, maximuiUj 
l ikehhood estimators a re consistent, te n d to norm ality for large Uy have*mmm^5]^ffl^^ 
iiq tKe limit at least, '*anTprb^de su^ient stetl^^ su ch "Sx &t. 

17.23. The reader may feel convinced intuitively that maximum likelihood estimators 
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^are consistent, in which case he can pass to the next section. We shall now prove the 
result formally. 

^ (a) If the frequency function / (x, 0) is continuous in x throughout its range, and 

V (6) if f (x, 0) is continuous and monotonic in 0 in some 0-interval containing the true 
value of 0, say 0®, and for all x in some a:-interva], 
then the maximum likelihood estimator of 0, say t, is consistent. 

Our proof will also cover the case of discontinuous variates which can be reduced to 
the continuous case by replacing each value by an interval in which the frequency is 
uniformly distributed. 

We first eliminate an inconvenience due to the infinitude of the range. In fact, if the 
range is infinite we make the variate transformation x = tan y. The conditions (a) and (6) 
remain true of y, and the maximum likelihood estimator in x transforms to that in y. We 
may therefore take the range as finite. 

Tlie next step is to reduce the ease to one of grouped frequencies by dividing the range 
into m intervals, the width of the jth interval being 1^. (We shall decide on the actual 
values of the i’s below.) Writing 

fi = (x. 0) dx, (17.43) 


we have, in virtue of the continuity of f in x, that fj/lj differs as little as we please from 
/(x,, 6). Then if L' is the likelihood of the grouped data, proportional to 




(17.44) 


where is the number of observations in the jth interval, we have, except for constants, 


log L' = ^ TO, log/, - ^ % log Ij 


(17.46) 


and this will differ arbitrarily little from the logarithm of the true likelihood 


V 

logZ- = 2^ log/ (a;,, 0), . 


(17.46) 


provided that w^e take m large enough and the Ts in consequence small enough. 

Hence we see that if t is the estimator which maximises L and t' that which maximises 
L', in virtue of hypothesis (6) that L and U are continuous in 0, t and will differ as little 
as we please for any given values of the x’b and that uniformly. We may therefore prove 
our theorem for the finite number of variables Uj and infer its truth for the continuous 
case by proceeding to the limit. 

In different samples the will vary, subject only to the condition that 2* (rij) = n. 
Let us choose the ranges 1^ such that (O®) = 1/m for all that is to say, such that the 
frequencies in all intervals are equal when 0 takes its true value Oq. Consider the likelihood 
function 


K = 2^ rif log z,. 


(17.47) 


where the z’s are subject only to the condition 


. (17.48) 
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*We consider three values of K defined by particular values of the z’s. 

(a) When = n^/n, K ia a maximum, say For we have 

% 

0 — £ dzj, 

and hence 

Zi Z 2 it (z) 

(b) When Zj =f^ (0®) = 1/w, K is, say, 

(c) When the estimator f assumes the value, say, tQ corresponding to the and 
hence z^ ==/y (^q), iC is a maximum, say among the particular set of values of 0 for 
which Zj =f^ (0) ; for this is our definition of t\ 

We have at once that 

Kr> Kz> Km (17.49) 

Now, as the sample increases, the observed n^/n converge in probability to their 
theoretical values /y (flo) = 1/^- Since K is continuous in the 25 ’s, Kj^ — will converge 
to zero in probability and, from (17.49), so will Kj^ — 

Now we show that this entails that each of 

converges to zero in probability. In fact, since , L (Oq) — — | does so, it will be enough 

Tb ' 

to prove that the same holds for 

' ( 17 . 50 ) 

ft 

Let Ki be the maximum of K for some fixed Zi. Then Kj^ > and 

- Kj, >K,- K,,. 

Hence — K^j converges to zero. The maximum is readily seen to be given by 


- '^1 (^ 


2, . . m 


. (17.51) 


Ki = til log Zi + (n — 111 ) {log (1 — Zi) — log (n — rii)} + ^ log n^. (17.52) 

Now Zi is a double-valued function of if j, continuous and having its two values equal 

dK 

for ill = ifyj ; for Ki is continuous in Zx from 0 to 1 (not inclusive), and -j—' changes sign 

OZx 

only for Zx == Wi/n, where Kx = Kj^. It follows that when — Kx is small, so is 
Zx — Ux/n. If the other z’s are not given by (17.51) Kj^ — jfif is smaller still. 

A similar argument applies for any j, and hence [ gy — — j converges to zero in proba- 

bility when Kj^ — K does so. Taking (t^) and remembering that in this case K 

becomes we reach (17.50). 

Finally, by hypotheses (a) and (b) at least some of the /y (0) have continuous inverse 
functions expressing 0 in terms of the functions /, and hence by taking 
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as small as we please, we may make /q — as small as we please. Consequently t' con- * 
verges to 0o in probability and is consistent. 


17.24. The reader may find the foregoing proof easier to follow if we express its 
main points in geometrical terminology. 

Consider the m proportions n^/n as the co-ordinates of a point in a space of m 
dimensions. The theoretical frequencies 
(0^) = \/m define a point, say M, in 
this space, and the sample point JR, cor- 
responding to an observed set of w^’s, may 
be regarded as varying round the theo- 
retical ’’ point M, The quantities z are 
the co-ordinates of any point in the hyjx^r- 
plane 2]* (z) == 1, which contains M and R. 

(See Eig. 17.1.) 

Now, for any sample point R the 
maximum likelihood estimator t' assumes 
a value which in general differs from 
00* This value defines m quantities (<[,) 
which determine a point Z. This also 
lies in the hypcrplane since the sum of 

the frequencies is unity. Thus the points JR determine a set of points Z which all lie on 
the curve defined for variations in 0 by 

(17.53) 



Fio. 17.1. 


Since d = 0^ Isa. possible value of 0, the point M lies on this curve ; R in general does 
not. 

What we have shown in analytical form is that the function K, which is the logarithm 
of a likelihood function defined for any jicint on the hyperplane, has a maximum at R 
and a maximum on the curve itself at Z, As the sample size increases, R is as near as 
we like to M (in the sensei of convergence in probability, that is to say, that as high a pro- 
portion of points R as we like are as near as we like to Jf). This involves that Z also is as 
near as we like to M. This in turn involves that the parameter- value corresponding to 
Z is as -dkisc'as we like to /J„ for as high a proportion of the possible points Z as we like, 
which is our theorem. 


17.25. We now prove a second fundamental property of maximum likelihood 
- estimators, namely that they tend to normality for large n. More precisely, 

(a) If condition (a) at the beginning of 17.23 is satisfied ; and if (more stringently 
than condition (6) of that section) (c) in a 0-interval containing the true value 0o, 

is continuous in 0 for every x, approaches a continuous function of 0 as x 

od vU 

fif 

tends to infinity, and does not vanish in some interval, 

then the maximum likelihood estimator t tends to normality for large n. The condition 

Sf df 

as to a;* ensures that in the transformation to finite range — remains continuous in d 
dO 00 

throughout that range. 
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We recall that if 



. (17.54) 


that is, if the f ’s are the deviations of the actual proportional frequencies n^/n from the 
“ expected ” frequencies 1/m, the distribution of the f!s in the limit will be normal and their 
distribution spherically symmetric. Consider again the orthogonal space of the previous 
section. The sample xx>ints are distributed about the point Jlf in a symmetrical form which 
tends to normality. If we choose a set of orthogonal axes in the hyperplane, the projection 
of the sample points on any axis is in the limit distributed normally with variance 1 /m» . 

In the neighbourhood of M the curve (17.53) approaches its tangent line as n becomes 
larger, and we therefore have, if a is the distance along the tangent from M, 


«* = (0 -0.)*^ j|/^(0.)|*. .... (17.56) 

as follows from (17.53). (The tangent exists in virtue of our hypothesis as to the differential 
coefficients of / in fl.) 

Now consider the point Z on the curve corresponding to the sample point R. We 
know that at Z the function 

K = Znj\og(^^ + (17.56) 

where we now measure z from ilf, is a maximum for variations in z such that Z lies on 
the curve. R is determined by finding the hj^ersurface (17.56) tangent to the hyper- 
plane S (z^) = 0, for at that point dK/dz^ is zero. We know that the co-ordinates of 
this point are z^ = n^/n — 1/m and that R is the point of tangency. as defined in 
17.23 is the value of K at R, and is that at Z, We then have, by Taylor’s theorem, 




to the second order of smalt quantities in dz. From (17.56) we see that 


dK 

— = n 

dzjt 


d^K 

dZf dzh 


= 0, 




j 9^ k 
3 


-‘I 


(17.58) 

(17.69) 


Hence 

Now Z (dZf) = 0, for the variation takes place in the hyperplane. Hence, for given B, 


Kz — Kjt + n Z (<5zy) 

^ 71j 


(17.60) 


Z is the point for which is a minimum. As n tends to infinity the w^’s tend to 




equality, and hence Z is the point on the curve which is nearest to R. Thus R is, in the 
limit, projected orthogonally on to the curve, that is to say, in the limit, on the tangent 
line. 

Now we know that these points are distributed normally with variance 1/mn and 
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this proves the theorem. We may also evaluate the variance of the maximum likelihood 
estimator ; for 

var 8 

varf = 




mn £ i~-ff (6) 


. (17.61) 


and since t' approaches t for fine grouping wo have also, remembering that 1/m — /y (0,), 


1 _ p 

arf j .« \0O/ / 


. (17.62) 


where 0 is to be put equal to do on the right. 

It may be remarked that condition (c) at the beginning of the section prevents the 

vanishing of ^ which might render the expression (17.61) nugatory. 

Ofj 

17.26. We have, then, under the afore-mentioned conditions, 


var t \ dO / 

df 

If the range is independent of 6, or if / and ^ vanish at any extremity of the range which 

depends on 0, we have the alternative form — 

— ^ = - n E (17.63) 

var t \ dO^ ) 

In fact, since f /da; = 1 where a, b are the limits of the range and may contain 6, we 
J a 

have * 


Differentiating again, we have 


Again, if the range is independent of B or if vanishes at the extremity, the last two 

* Tho operation of differentiating imdor the integral sign requires certain conditions as to uniform 
convergence, even when tho limits are independent of 0. To avoid prolixity we shall always assume 
that the conditions hold unless the contrary is stated. The point gives rise to no statistical difficulty 
but is troublesome when one is aiming at complete mathematical rigour. 

A.s. — n c 
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tinsiis dh the right in (17.64) axe zero, and we have (reverting to our usual convention as 
to limits) 

and the result follows from (17.62). 

17.27. We now prove a third fundamental property concerning the efficiency of 
maximum likelihood estimates. 

If t be any estimator of ^ the range of / (x, 6) is independent of 6, and in large samples 
t is distributed normally about mean 6o (the true value of 0) with variance v ; then 

^ ^ ’ j* (p^§IXfdx,vnthe^do-, 


cannot exceed 


nv 


dd ) 


and hence,/ if a maximum likelihood estimator exists, it is most-efficient in the class of 
such estimators. 

By hypothesis, we have in the limit for the frequency function of t, 


0 = 


1 


exp 


and hence 


'\/{27tv) 
log 0 




30* 


1 

V 


( 17 . 66 ) 


. ( 17 . 66 ) 


where, for convenience, we drop the suffix of 6 until the end of the proof. We then have 

i.r 


Now consider 




.* di. 

0\d(l J 


»-|j(log£) . 


( 17 . 67 ) 

( 17 . 68 ) 

Xn conditioned by * = constant. 


. (17.69) 


as a random variable over the possible values Xi . 

Since the frequency of u is L, we have 

£(i) {^(i))* 

with summation (or integration) over the range of x’s. Now 0 is the frequency of all 
samples having a constant t, and hence 

0 = E{L). 

Hence 


0 0^ 


Now vart^ cannot be negative and 0 is not negative, and hence 


. (17.70) 


. (17.71) 



MAXIMUM LIKELIHOOD 


19 


But 



00 

dQ' 


and hence, substituting in (17.71) and integrating over all t, we have 





(17.72) 


Now S is carried out over all x for constant t and the integration over all so that the two 
summations together are equivalent to summation over the a:’s without restriction. Hence 



which establishes the result, since the expression on the right is the reciprocal of the variance 
of the maximum likelihood estimator, if it exists. 


17.28. The fourth fundamental theorem of maximum likelihood estimators is as 
follows : — 

■^If a sufficient estimator exists, it is a function of the maximum likelihood estimator. 
In fact, the likelihood can then be put in the form 

Tj ~ Tji {t, 0) (^1 • • • 

where does not contain 0. Hence 

= W ^ function of 0 and t only. . (17.73) 

0 

Hence, for fixed t, ^ log L is constant, and it follows from the previous section that the 
du 

variance of t is equal to the variance of a most-efficient estimator (for var u is then zero 
for fixed t and the inequality (17.72) becomes an equality). Hence the sufiicient estimator 
is most-efficient, confirming the result of 17.18. 

It follows from (17.73) that the maximum likelihood estimator is given by 

V^(0, «) = 0, (17.74) 

which proves the theorem. 

Conversely, if t is such that (17.73) is true, it must be sufficient ; for then we have 

log i = C + J (0, t) dO, 

where C does not depend on 0 and the likelihood is of the requisite form. 

Example 17.7 

Consider the estimation of the parameter m in the population 
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where <s is known. The frequency function is easily seen to obey the conditions relating 
to maximum likelihood estimators. We have 


1 

logX. = - nlogaVi^) ~ 2 ^ 

and hence the maximum likelihood estimator is the root of 

s' 


AlogZ, = i r(a5 - m) = 0, 


giving 


m = ~ 27 (a;) = f . 
n 


It is frequently convenient to denote the estimator of a parameter by writing a cir- 
cumflex accent over it in this way. 

In this case the sample mean is the maximum likelihood estimator. It is therefore 
most-efficient and no other estimator can have a smaller variance in the limit. For the 
variance we have, from (17.63), 



giving the familiar result — 



n 


or* 

var a: = — . 


n 


This, as it happens, is true for any n. The estimator is also sufficient, for 

^ log L = (nx — nm) 
otn 

= a function of m and x only. 

The condition that cr* is known is to be noted. Complications arise when two parameters 
are estimated simultaneously, as we shall see presently. .. . 


Example 17.8 

Consider the estimation of 0 in the Type III distribution 


r{p) O" ’ 


0 < a? < 00 


where p is known. 
We have 


log/ = (p — 1) log a; — - — log r{p) —p log 0 


and hence, dropping terms independent of 0, 


1 
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The equation of maximum likelihood is then 

1 




giving 



§ = 





np 


by (17.63), 

as 



11 

■ > 

-”J 

•QO y 

1 TO \ 

. 2? + P\f^^ 
03 ^ 02 JJ 


- n • 

U-- 




02/’ 

var 0 — 

0* 

- - 




np 




where 6 is the true value of the parameter. We could also have obtained this result directly 
(and again it happens to be true for all v). From Example 10.11 (vol. I, p. 244) we have 
for the distribution of x/p — 0, 

yi2)6^ 


dF = n'^p{^-] \ ? / 

\®/ (^P) 

from which the first two moments about the origin are 


dd, 


. np i- 1 . 
P\ = Oy //2 


nj) 


giving 


var 6 — p.i — 


np 


We note that the likelihood function may be put in the form 

log 1/ = — 1) i7 log X — n log r (p) — “ np log 0, 

0 

from which it is evident that 0 is sufficient. 


Example 17.9 

Consider the estimation of the parameter A in the Poisson distribution whose general 
A* 

term is 

xi 

In this case the likelihood function is discontinuous and we have 

g-nA ;^S(X) 

Jj = 


Hence 


Xx \ . . . »„!' 

^logZf = — » + : 


d)i 


A’ 


giving A = z, the sample mean. 
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For the variance we have 




n 

J 


var A == - , a familiar result. 
n 


It is easy to see in this case also that A is sufficient. 


Example 17.10 

What is the most general form of distribution, differentiable in 6, for which the sample- 
mean is the maximum likelihood estimator ? 

We are given that a solution of 

is 0 = - Six) 

n 

or Six — 0) — 0. 


This is true for all x and 0, and hence 

/ dO 


ix ~ Q) K, 




where K is independent of x but may be dependent on 0, say equal to Then, 

integrating, 

d^ip 


log/ = j d0 (a: - 0) 


00 ® 


=-ix-e)^ + y,+Cix), 


where C (x) is an arbitrary function of Hence 

/ = A: exp |(a; - 0) || + V (®) + C (a:)|, 
which is the most general form o{if.\ 

If y> (0) = i0*,‘ :(») = - 

the form becomes the normal distribution 

f — k exp {— i (* — 0)*}. 


Sticcessive Approximations to Efficient Estimators 

17.29. In the examples we have just given, the solution of the Tnaxinuini likelihood 
.equation was carried out without difficulty. It frequently happens, however, that the 
equation is by no means so easy to solve explicitly, though it can somethuM bei solved: 
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for particular values of x by iterative methods. Another possibility is to compute ad 
inefficient estimator and correct it by an extra term, which can be obtained as follows : — 
Let f be an inefficient estimator and t a most-efficient estimator. Let 

Then var d = var V - 1 - var t — 2 cov i). . . . (17.75) 

Remembering that if E is the efficiency of t\ 

var t ~ E var 
cov (t\ t) 


wo have 


(var^ var^')* 
var 6 


y/E (see (17.39) ) ; 
1 -^E 


E 


var t. 


. (17.76) 


If then V is “ nearly ” efficient, that is, if 1 — E small, the average value of 5 = f' — < 
will be small. 

If the maximum likelihood equation is 


consider 




f — t' + var 1 1 


We have 


/dlogL\ 
\"B0 


(17.77) 


^ (V - t) 

/a* log /A 
\ 30“ )o,t ' 

For large n, approximately 

_ , J_ = 


var t \ 90* /»,( 

and hence, approximately, 

/aiogLX _t -t' 


\ 90 /o -/• var t ' 

Hence 



1 

+ 

1! II 


. (17.78) 


and f is an efficient estimator to a better order of approximation. This process may be 
repeated and, rather like Newton’s successive approximation to the roots of an equation, 
may be expected to improve the efficiency of an estimator. 

Example 17,11 

Suppose we have to estimate 0, the parameter in the Cauchy population 

\ dx 
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We have already seen that the sample-mean is not a satisfactory estimate and that for 
large samples the median is consistent and has variance 7i*/4n. 

The equation of maximum likelihood gives 

_0iogL_ r 2(x-e) \_ 

de \i+(a:-e)*j 

This is a (2»— •l)-ic in 0 and correspondingly difficult to solve. We may, however, 
find the variance of the solution B from (17.63). We have 




2 (* - 0 )* - 2 

, {1 +ix - 0)*p 

(a:* — l)da; 

(1 


Hence 


var 0 = 


The median, therefore, has an efficiency of — ■ = 0-8, and we expect that 






n \] '+ (x - 

where t' denotes the median, will be an improved estimator. 


Most Oeneral Form of Distributions possessing Sufficient Estimators 
17.30. If t is sufficient for 0 we have 


( 17 . 79 ) 


where K is some function of t and 0. Regarding this as an equation in < we see that it 
remains true for any particular value of 0, say zero. It is then evident that t must be 
expressible in the form 


t=^M\£^k{xA, . 


( 17 . 80 ) 


where M and k are arbitrary functions. If w = £ k (x) then K is a function of 0 and 
w only, say N (t, w). We have then 

d^logL _dN dw n7ai\ 


dw dxt 


. ( 17 . 81 ) 


Now the left-hand side is a function of 0 and Xf only and w is a function of Xi only. Hence 
^ is a function of 0 and x, only. But it must be symmetrical in the x’s and hence is a 

function of 0 only. Hence, integrating with respect to w, we have 

N {t, w) =wp(e) +q (0), 
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where p and q are arbitrary functions of 0. Thus * 

^ (log i) = A ^ (log/ (a;^, d)) =p(d)Sk {Xj) +q(6) . . (17.82) 

whence A log/ {x, 0) ^ p ($) k (*) + ^ ? («), 

giving / (X, 6) = exp {p (0) k(x)+q (6) r (*)}, . . . (17.83) 

where we still write p and q for the integrated functions. 

The expression may also be written 

/ {x, 0) ^ Q (0) R {x) exp {p (6) k (x)} . . , (17.84) 

or,vif we simplify the specification of the distribution by writing 0 instead of p (0), 

f(x)^Q (0) R (x) exp {Ok(x)} (17.86) 

It will be found that if (17.85) holds, the likelihood function is of the required form for 
the existence of a sufficient estimator, so that the equation is sufficient as well as necessary. 


Distribution of Sufficient Estimators 

17.31. It is remarkable that the distribution of a sufficient estimator can be obtained 
directly from the likelihood function. .From (17.85) we have 

log L = w log Q + i71og 2? (a;) + 6 Zk (a;) 

giving, for the maximum likelihood estimator, 




Now, for the characteristic fimetion ^ (a) of w ( = 2 k (x)) we have — 


^(«)=f • • • f <'^'^f{xi,0)dxi . 

J — TO J — cc 

= I j 

= I j Q{e)B {x) efeej'* 

\Q(0 + ia)J 


/ {x„, 6) dx„ 


Hence the frequency function of w, if existent, is 

/w_j.r e—i 

\e(e + *a)J 


Now from (17.86), 


w 


\Q 


= n S (t), say, 

and hence the frequency function of the estimator t is 


(17.86) 


(17.87) 
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EsampU 17.12 

The normal distribution with unit variance may be put in the form 

1 


/ = 


e 


V{2n) 

Comparing this with (17.85), we see that if 

Q (0) = e-*^‘ 
1 




E(x) 


»-*** 


V(2^) 

h(x) X 

the condition for a sufficient estimator is satisfied. That this is (as we already know) 
the mean x may be confirmed from (17.88). We have 

-S(0) = -^log(2 = 0; 

and hence for the frequency function of the estimator x. 


^ 1 

roo _ j 

i ^—ianx ] 

r e-*®* 

r^a 

271 J 

Loo 1 

1 g-«a+ia)« 

J 

-1 
271 J 

•00 

exp {- 

— QO 

- — 

ia.n{x 




n 

2n 


exp { — |n(*— 0)*}. 


Example 17.13 

The Type III distribution considered in Example 17.8 may be put in the slightly 
different form 


dF = 


7^ 

r(p) 


g.p-1 e-y^dx, 


0 <» < 00 . 


Regarding p as known and considering y as the parameter under estimate, we see that 
a sufficient estimator exists, because we may write 

^ ^ 7%) 


E (x) = 
k (x) — X, 

which throws the distribution into the form (17.85). We have found the estimator and 
its distribution in Example 17.8. 

On the other hand, suppose that y is known and we wish to estimate p. Writing 


Q{p) = 


7^ 

I'iP) 


R(a;) = 
k (x) = log X 


we see that a sufficient estimator for p also exists. 


d 


1 


It is the solution of 
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which does not permit of expression of p as a simple function of the rr’s. The sampling 
distribution is not expressible in a simple form. 


Example 17.14 

Consider again the Cauchy distribution 


dF 


1 dx 

71 I + (X 0 )^’ 


— 00 < X < 00 . 


Evidently this cannot be thrown into the form (17.85) and hence no sufficient estimator 
exists. We have already found (Example 17.11) that there is an efficient estimator. For 
finite n no single estimator can contain all that the sample can tell us about 0. 


Sufficient Estimators when the Range depends on the Parameter 

y 17,32. One of the conditions of the theorem of 17.23 and that of 17.27 is that the 
range should be independent of 0. In the contrary case our results, particularly for sufficient 
estimators, require reconsideration. 

Suppose the range of the frequency function is from 0 to 6, where b is fixed. If there 
is a sufficient estimator for 0, say <, the distribution of t and any other estimator is inde- 
pendent of 0. Take x^y the lowest value of the sample, as such other estimator. Then 
if t is fixed the distribution of Xi is independent of 0, which is clearly impossible unless in 
fixing t we also fix Xu that is to say, f is a function of Xi. Thus if a sufficient estimator 
exists it must be a function of Xj. 

Similarly if the range is from a to 0, a sufficient estimator for 0 must be a function 
of the largest sample member. 

17.33. If Xi or some function of it is sufficient for 0, the lower extremity of the range, 
and Xi is fixed, the probability that any particular sample value x is greater than Xi is 
proportional to / {x, 0). This must be independent of 0, since Xi is sufficient, and hence 
so h f(Xy 0)/f(xiy 0). Thus 

=r® 

and this is the most general form admitting a sufficient estimator. 

It remains true in such circumstances that the smallest member of the sample is 
a maximum likelihood estimator. For the likelihood is 

j ^g{x^) . . . g(x^) 

{h{o)r 

which is clearly a maximum when h (0) is a minimum. Now since the total frequency is 
unity we have, from (17.89), 

^ (0) == J g (x) dx. ..... (17.90) 

0 cannot be greater than Xi, for then such a sample value could not appear. The value 
which minimises h (0) is seen from (17.90) to be that which minimises the range, i.e. Xi. 

,17.34. When both extremes of the range, a and 6, depend on 0, some further modi- 
fication is necessary. Suppose that a is equal to 0 and that b (0) is some strictly decreasing 
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'function of d. Let be the value such that b (X„) = the greatest member of the 
sample, and let t be the smaller of Xx and X^. Then of the inequalities 

t <Xi, 6 (<)>»„ . . . . . (17.91) 

one at least is true. But the first equality implies that t>0 and the second that 
b{t) <b (0), and either of these two implies the other. Hence both inequalities in (17.91) 
are true, and 

0 <t <xt <x„ <b{t) <b (0) (17.92) 

Samples with fixed t then lie in a fixed range, and hence t is sufficient if the frequency 
' function is of the form (17.89). It would seem that this remains the most general form of 
frequency function admitting a sufficient estimator when both extremes of the range 
depend on 0. 


Example, 17.15 

Consider the rectangular distribution 

dF -0 <x <0. 

If we take the ordinary likelihood equation we get 

liog£_-A»iog(2e)--“ 

For this to vanish 0 must tend to infinity, an obviously nugatory result. In accordance 
with the above discussion we should take as our estimate of 0 the smaller of Xi and 
and this is obviously sufficient, for nothing in the sample can tell us more about the 
terminals of the range than its most extreme members. 


Intrinsic Accuracy 

%/ 17.35. If the sampling distribution of an estimator t is 


we define the accuracy of t as 



. (17.93) 


. (17.94) 


It is evidently essentially a positive quantity. We assume, unless the contrary is stated, 
that the range is independent of 0. 

r is the quantity we have already encountered in (17.67) as the reciprocal of the 
variance of t when it tends to normality in large samples. As in 17.27, we have 


r <n 



. (17.96) 


<nly say, where 

aiog/y 


=L( 


00 J 


fdx. 


(17.96) 


Now I is independent of the estimator t bnd we may call it the intrinsic acoaracy of 
Hie distribution / in regard to 9. It is intrinsic because it depends only on /. It may 
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be termed accuracy because it provides, for large samples at least, a minimum to the* 
variance of possible estimators of 9. We know from 17.25 that under certain conditions 
the maximum likelihood estimator attains this minimum for large samples. 

17.36. We may now extend the definition of efficiency of an estimator to the case 
of small samples. In fact, the efficiency is the ratio of the accuracy of an estimator to the 
intrinsic accuracy of the distribution for the parameter under estimate. This is easily 
seen to apply to the case of large samples for which efficiency was defined in 17.12, and 
may be applied to finite samples or non-normal sampling variation. For such cases, 
however, it is conceivable that the efficiency might exceed unity. A proof that this is not 
so when the range is independent of 0 is suggested in Exercise 17.12. 


njyj. If the range is independent of 0 we have 




and hence the following three expressions for the intrinsic accuracy are equivalent : 


YSilogA 
V 30* / 

(-'r) 


. (17.97) 


This equivalence holds if / is zero at the extremes of the range. For we then have 


- r 

Ja 


But if /is not zero at the extremes tlie equivalence may break down. (Cf. Exercises 17.9 
and 17.11.) 


Amount of Information 

17.38. The quantity nl has been called the amount of information about 0 in the 
sample of w, and I may be called the amount of information per member of the sample. 
The use of information ” in this specialised sense has not been universally accepted, 
but some of the properties of I are such as we should require of any measure of information. 

(а) If the parent does not contain 0, I -= 0 so that no sample can tell us anything 
about 0, which must obviously be so. 

(б) Since sufficient estimators contain all the relevant information in the sample 
we expect their accuracy to be nl, and conversely. That this is so may be seen as 
in n .21 and 17.28. In fact, if t is such that the equality in (17.72) holds, var u — 0 

and for fixed t, ^ is constant, irrespective of the form of distribution of t. Log L 

is then of the type required for sufficiency. 
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(c) The sum of the amounts of information in two independent sample-members 
is the amount of information in the pair taken together. For if their joint distribution is 

dF = /i {x, 6) dxft (y, 0) dy, 
we have for the intrinsic accuracy 




( 17 . 98 ) 


which is the property stated. 


Loss of Accuracy 

y/" 17,39. Where no sufficient estimator exists, it follows from (6) of the previous para- 
graph that no estimator for finite n can contain all the information in the sample. In 
so far as any particular estimator falls short of the ideal we may be said to lose information 
by using it. No estimator can avoid losing something, although of course some may 
lose less than others. 

Presumably the loss will be greater for large samples than for small ones, and will 
be least for maximum likelihood estimators. We may calculate the loss in this case. If 
t is the maximum likelihood estimator of 0, we have, to a first approximation. 


do ' ' ■ ae* ■ 


. ( 17 . 99 ) 


d log L . 


The variance of in samples for which t is constant is thus the variance of 


log L 


within the set multiplied by {t — fl)^. Now the total loss of information, from 17.27, 
is seen to be vartA = var and hence is equal to the variance of t multiplied 

02 iQg 7^ 

by the total variance of ^^2 within sets for which t is constant. This we now evaluate. 

Suppose the distribution is grouped so that the ‘‘ expected frequency in the jth 
group is The likelihood is then proportional to . . . and apart from 

constants independent of 0 we have 


We have at once 


log L = log . 

i 

dlogZr „m' , , dm 

ddm dO 

d^logL m'2\ *1 

^ -irO"}- • 




( 17 . 100 ) 


( 17 . 101 ) 


( 17 . 102 ) 


. ( 17 . 103 ) 
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We shall find it most convenient to regard the n’a as distributed over the groups first of* 
all without restriction and then subject to two linear constraints expressed by (Uf) — n 


and 


31ogL 

m 




constant. From this viewpoint the n’s may be regarded as 


dil^tributed in the Poisson form with mean and variance m (not the binomial because we 
are not introducing the restriction that the samples should be of fixed size, except as a 
constraint). 

Now if E (k^ Uj) is a linear function of the n’s subject to a linear constraint E (a^ n^) = p, 
its variance is 


r (k^ m) 


E^ (koLtn) 
Y(moL^) ’ 


(17.104) 


and a second constraint reduces the variance by a term similar to the second in this expres- 
sion. The result may be seen from geometrical considerations. We may write 


r (kn) 
E {0i7l) 



and 


where the variables — ^ have unit variance and mean ^/m. Consider the different values 
'\/m 

of the n’s, say s in number, as the co-ordinates in a Euclidean space. The density function 
of the variables is then symmetrical about a point (\/^i> V^ 2 , • . • V^«) to which we 
transfer the origin. The variance of the unconstrained variables is then equal to the 
reciprocal of the distance from the origin to the liyperplanc E (k^/mx) = 1, namely, to 
E (k^ m). But when the constraint is imposed, the variance becomes proportional to the 
reciprocal of the distance from the origin to the hyperplane in the direction parallel to 
E {oLy/mx) = 0 and is hence reduced by the amount 


cos- <f> E (k'^ m). 


where <f> is the angle between the planes. This quantity is 


E^ {ky/m.OLy/rn) 
E (k^ m) E {(x^ 7n) 


E (k^ 7n)y 


which gives us the second term in (17.104). 

Now for the first linear constraint E {n) = constant = n we have a = 1, and the 
reducing term is (since E (m) n also) : 

~ E^ {km), 
n 

‘m! 

For the second constraint we have a = — and hence the term is 

m 


Thus the variance 


2’* (km') 



of 2 (kn) is 

E (k* m) - - 2* (km) - 
n 


2* (Jfcw') 



. (17.106) 
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Now taking 

and remembering that 


var I 


m" w'2 

m 

w* 


(-) 


\mj 


we see from (17.102) that the loss of information is, for large samples, 

\m \ m J — - £ ( 

n \ m ) 


■m 


Hi 




(17.106) 


By considering the width of the groups as tending to zero we may apply this result 
also to continuous distributions. 


Example 17.16 

In the distribution 


dF == 


1 


dx 


— 00 < zr < 00 


7tl + {x ~ 0)2’ 

there is no sufficient estimator, as we have seen. Let us consider the loss of information 
consequent upon using the maximum likelihood estimator. 

We may write for our “ expected ” value m 

n dx 

-0)2* 


m = 


Hence 


7t I + (X 

^p^dp _n 


{m\ ni / } 7r 


4 {pi 


l)idp 
(1 + pi)i 


Tn 


Hence, from (17.106), the loss of information is 

I-.l + o = ?. 

4 2 4 

The intrinsic accuracy of the original distribution is so the loss of information is equivalent 
to observations for large samples. For small samples it will presumably be smaller, 
since it vanishes for samples of one. The loss by use of the maximum likelihood estimator 
is therefore very slight and becomes of diminishing importance as the size of the sample 
increases. 


Ancillary Estimators 

17.40. Where no sufficient estimator exists no single estimator can avoid the loss 
of information ; but we may take an additional function of the variables which, together 
with the maximum likelihood estimator, will give an accuracy tending to unity in large 
samples. By taking a third function we can improve the accuracy still further, and so 
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on. The process is analogous to approximating to the value of a function (the likelihood 
function) by ascertaining its differential coefficients at some particular point of the range. 

In fact, suppose that, in addition to the estimator which gives - for some value 

dO 

of 0 such as ty we also find ^ f'or that value. The variance of ^ over values 

oO 


in the neighbourhood of those for which these two are constant is then, to the first 
approximation, the variance of 


j (t - oy^ 


log L 

■ 


which has ordinarily a mean value and variance of lower order in n 

' a log L 


In particular, if t 

is the maximum likelihood estimator, so that ( ^ ^ ^0, the value of ( ^ 

\ oU Jot \ Jo^t 

may provide supplementary information which enables us to approximate more closely 
to the likelihood function and hence salvage some of the lost information. Such a quantity 
ij? accordingly called an ancillary estimator. Cf. 17.29 above. 


Multivariate. Distributions with One Parameter 
^ 17.41. W(5 now proceed to consider the extension of some of the foregoing results 

in two directions : {a) where there is more than one variate but still only one parameter, 
and (/;) where there is more than one parameter to be estimated. 

The former raises no n(^w point of difficulty. To tak(^ the bivariate case as an example, 
if the frequency function is / (a;, y, 0), the likelihood is 

L=f(x,yy,yO) . . .f(x„,ij,yO) .... (17.107) 
and our maximum likelihood estimator is obtained by maximising L in the usual way. 


Example 17.17 

^^o estimate the parameter p in samples of n from 
We find 

log L constant log (1 - p'^) — . - 2/> E (xy) 1 - E (y'^) }, 

2 z (l p ) 

whence, for ^ — 0 we have 
dp 

1 - p^ (1 — p^)^ P 

reducing to the cubic in p, 

p(l -p^) 1 - p^ 

It is interesting to note that this does not yield the product-moment of the sample. 
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We have, after a little reduction, 
3Mog/ _!+/»* 


(»* - + y*) + , 




3p* (1-p*)* (1 - p*)» ■ (1 - P*)* 

Since E {x^) = E {y^) = 1 and E (xy) = p, we have, for the estimator p. 


xy. 


1 _ 1 + P* 

n YAT p (1 — p2)2 


2 (1 + 3pJ> 


+ 


(l^p2)V 


whence 


VAT p 


(1 
(1 

w(l + p"*)' 

This is less (and may be considerably less) than the variance of the sample product-moment 
in large samples, (1 — p^)^/n. The efficiency of the latter is 1/(1 + p®). 


Simultaneous Estimation of Several Parameters 

17.42. We now turn to the case when the unknown parameters are more than one 
in number. To simplify the exposition we shall consider the case of two parameters 0i 
and 02 * but examples not infrequently arise where more than two have to be estimated — 
for instance, in the fitting of certain Pearson curves there are four. To fix the ideas, 
consider the normal distribution 


dF 


1 

0» V(2^) 


exp \ - 2 ^, (a: 


0i)*| dx. 


CO X < 00. 


The likelihood function, except for constants, is given by 


log L = - n log 0, - £ (x - 0,)"- 


. (17,108) 


It is natural to generalise our principle of estimation by looking for estimators which shall 
maximise L for independent simultaneous variations of 0i and Oj, i.e. to require that 



In our case this leads to 

i: (a: - 0.) = 0 




whence for the estimators Si and §2 


5i = -* i7 (a;) ~ 
n 


(17.110) 


= i i: (a: - (17.111) 

Thus the sample mean and variance are estimates of the population mean and variance. 
We note incidentally that the estimator 62 is biassed. 


17.43. There is one possible source of confusion here which should be removed. 
If we know Ou then 62 is given by 

5. = - 2 ,’ (a: - 0,)*, (17.112) 

n 

which is not the same as (17.111), the sample-mean x having been replaced by the known 
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quantity Oi, Suppose then we estimate by x, as we may do whether we know 0a or not, 
since (17.110) does not contain 02- We may then ask, what is the estimator of 0a which 
maximises the likelihood for all samples giving the ascertained value of Oi, namely, x ? 

This is an entirely different question from the one which gave rise to (17.111) and we 
must not be surjmsed if it has a different answer. The variations of L from sample to 
sample are now considered in a certain sub-population for which x lias a fixed value. 

In our particular case the problem can be solved explicitly. The likelihood function 
can be thrown into the form, with variables x and s — 


L dx ds 


1 /_!L 

02 \] ^2,n 


ex]) 


20 ?, 


(x - 0 ,) 




/ A* V *^1 / j- j /in II 

o; '"‘i’ (“ Ml 

where is the sample variance. 

If we maximise the likelihood in this form for simultaneous variations of 0i and 02 
we arrive back at (17.110) and (17.1 1 1), as of course we must. But if x has a fixed value, 
the distribution of s becomes of one lower degree of freedom. The likelihood is then 
proportionill to the s(H;ond factor in (17.113), viz. 


yW 2 


er 


exp 


(-rj 


and for variations of 02 this is maximised by 

71 1 


O i 


n 


“ 71 


'(X xy. 


(17.114) 


This, it may be noticed, is an xinbiasscd estimator. 


17.44. The difference between (17.111) and (17.114) is apt to be confusing, for both 
are, in a sense, maximum likelihood estimators. The distinction arises from the fact that 
we are considering the variation of L in two different populations, the first over all samples 
of size n, the second over the mon? restricted samples subject to the further constraint 
2J (a;) = constant. The difference when 7b is large, of course, is quite unimportant, but 
as theoretical matter the j)oint has some interest. 

' Which of the two is employed for practical estimation is a matter of choice. At first 
sight it may strike the reader as objectionable to use (17.114), because x is not known before 
the sample is drawn, and there are obvious dangers in basing an inference on properties 
of the sample which are determined a posteriori. This objection, however, does not lie 
in the present case. We make up our mind beforehand that, whatever x may turn out 
to be, we will make an inference in relation to the sub-population of samples determined 
by it. There is, in fact, no posterior det(;rmination of the rule of inference. 

17.45. Possibly without realising it, the reader is already accustomed to make an 
inference of this kind in relation to a sample number. We do not usually determine before- 
hand what size the sample must be ; our results (apart from the distinction between small 
and large samples, which is another matter) are true for any n, whatever n may turn out 
to be in practice. In the same way the estimator (17.114) is a maximum likelihood esti- 
mator, whatever x may turn out to be, x being a property of the sample, just as 7i is. 

The fact remains, of course, that (17.111) and (17.114) give different results. Which 
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is the better ? The answer depends on what we require of the estimator. If we wish 
to choose 6 1 and d, so as to maximise their joint likelihood we choose (17.111). If we wish 
to select them so that the likelihood is maximised for and then, for the observed f , is 
maximised for 02? we choose (17.114). 

17.46. It may be shown that, as for the case of one parameter, the likelihood esti- 
mators of several parameters are consistent under very general conditions and tend for 
large n to bo distributed in the multivariate normal form. We omit the proof of these results, 
which the reader will probably be willing to accept, and proceed to a generalisation of 
the theorem of 17.26. Thus : — 

(а) If the frequency function / (x, di, 0^, , , , 0,,) is continuous in x, and 

df 

(б) if in a certain interval containing the true values 0io, 02 o> • • • 0 ^ is 

oUj 

continuous in 6^ for every x, approaches a continuous function of 0. for large 

OUj 

df 

n, and ~ does not vanish in some interval, then 


n cov (0y, Of,) = 


where A is the (Hessian) determinant 


A^ r fd. 

J TO \ / OjO \ / ®A:0 


(17.115) 


. (17.116) 


and Ajfc is the minor of the jth row and Hh column. When p = I this reduces to the 
case of a single parameter. 

As n tends to infinity the joint distribution of the maximum likelihood estimators 
tends to the form 

/ ^ A exp j- ^ (e,. - 0i,)(0^ . . . (17.1 17) 


The theorem will be established if we show that 


_r /5log/\ /81og/\ , 

wk vwk ' 


. (17.11S) 


for then the values of the variances and covariances of the 0’s are as stated in (17.116). 
(Compare 15.12.) 

Make the transformation 

(17.119) 

and choose the A’s so that the exponential of (17.117) becomes 


7h P 

9jk == ^ 


. (17.120) 


The are independent normal variates with variance l/n. Hence, from the theorem for 
the case of a single parameter, already proved, we have 




fdx ~ 1. 


. (17.121) 



SIMULTANEOUS ESTIMATION OF SEVERAL PARAMETERS 
Further, we have 

a log/ a log/' 


LC 




9h 


for if we put 
and 

the expression becomes one half of 


-^fdx = 0, 

1 


h ^ I, 
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(17.122) 


V2 


(% - “/) 


<h = ^2 


awft / V / J’ 

which vanishes since the tt’s have the same variance as the g’s. 
Now 

'^log/\ _ r^lps/Z^^N _ _ j 

dOf u, ~ ag„ V 'aa^A^ t ag. 


Hence 


(' 


r /aiog/\ /aiog/\ _r , aiog/aiog/\ 


^ ^hj ^hk> 


in virtue of (17.121) and (17.122), 
from (17.120). The theorem follows. 


^ ink 


Example 17,18 

Let us estimate the five parameters of the bivariate normal form 


1 

1 r 

/x - a\ 

27r<7i<72(l - p'“Y ^ 

. 2(1 ■-p^)\ 

( ) 


« 2p(xy ^)(y -J) 

(Jl <72 




CO <,x, y 


oo. 


It will be found that the partial differential coefficients of log L yield, on solution, the 
estimators 

a X, ji = y 


1 


-2’ (a- - - a)‘ 


p di ff j -= - 2’ (a - a) (y - g) 

71 

ai - i2’(?/ - yV 

SO that for simultaneous estimation the sample means, variances and covariances are 
estimates of the corresponding parameters. 

To evaluate the sampling variances and covariances we have to evaluate integrals 
of the type 

J .J-.V no, M, ) 

These are easily obtainable, being merely functions of moments of different orders. 
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Taking the parameters a, p in that order, we find for the Hessian (17.116) 


1 

p 

0 

0 

0 

(Tf(l -p«) 

Oi<ra (1 — p2) 


P 

1 

0 

0 

0 

OiCTa (1 -p*) 

oiTi-P*) 


0 

0 

2-p^ 


p 

a?(l -P*) 

Oi Ot (1 — p*) 

0,(1 -p*) 

0 

0 

P* 

2-p* 

p 

<r, Oit (1 — p^) 

ol (1 - p*) 

o, (1 - p*j 

0 

0 

P 

P 

1 4-p* 


0,(1 -P*) 

a. (1 - p*) 

(1 - P*)* 


This confirms, what we know already, that the distribution of means is independent of 
variances and covariances. We may consider the 2x2 block in the top left-hand comer 
and the 3x3 block in the bottom right-hand comer separately. If the determinants 
of these blocks are Ai and Ja* we have 



^2= - 

The minors will be found to be given by 


4 

afafil-^pY 


4p 


4jp 4 




0 

0 


Hence we find 


0 

0 

0 

A O’? 

var a = — , 
n 


0 

2 


2p^ 


0 

2p 


a^aUl- pY otalil- a* (1 - 


2p* 


2p 


al al (1 - pY <tJ ai (1 - P*)* 0*01(1- p»)» 


2p 


2p 


o\<4(\-pY atol(l-p»)» a\<4(l-pY 

s a| 
var/5 


var Ox var var p 


(1 ^p^Y 


2n, 


2n 


n 


These results are already familiar. We have further- 


cov (d„ dt) = , 


cov(a,/?) 

n 


Hence the correlation between 6^ and d. is p^, that between & and /$ is p, and that between 
P and di or d, is 
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Example 17.19 

Consider the TypeJQI^jiistribution 

' '■ * ”■ 

For the likelihood we have 

logL = - w log tr - wlog/'(p) ) (p - Uriog^"?-- -^ - 
The three partial differential coefficients give 

-(p - 1)2:--— - +- = 0 

{x — a) (T 


\ogr(p) 

dp 


{ 2’ log 


- 27 (X ^ a) - 0 




For the Hessian, taking the parameters in the order a, a, p, we have 

' A > - i 

(p _ 2) a- ff (p — 1) I 

I pi! 


<T (p - - 1] 


rfM0g/’(p_) _2 

f/p2 p - 


(p — 2) a* t dp^ p — 1 

From this the sampling variances are found to be 

1 r rfMogr(p) . 

vfi,r or < o - 1 


rfMogf’(p) 

dp^ 

_L_.f ...1 I 

p - 1 Mp - 1)*J 



1 

var a 

nAa- 


1 

var a 


var p 

nA \p 


l(r*l dp* 

1 f I d* log /’ (p) 


(P 1)* 


Sufficient Estimators for Several Parameters 

17.47. As a natural generalisation from the case of one parameter we shall say that 
. tp are joirvay sufficient for 9^ ... Op if, and only if, the likelihood function can 

be expressed as 

L(xi . . . *«. 01 • • • ^p) • • • <j» 01 • • • 0ii)^*(®i • • • ®n) (17.123) 

It evidently does not follow that if 0, . . . 0^ are known is sufficient for This wiU 
be true only if the function Li may itself be factorised, e.g.— 

Li (ti . . . ip, Ox, ... Op) = ill {ii, 01 . • . Op) Lit (tt . . . tp, Oj . . . Op). . (17.124) 
If a case occurred in which 

il = ill {ti, 0i) iia (^11 0a) • • • ilj> (^i>> 0j>) • 


- (1.7.126) 
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we might say that each t was sufficient for the corresponding 0 or that the set of t's was 
completely sufficient for the 0*s. Such cases, however, are vejpy rare. 

Example 17.20 

From (17.113) it is evident that x and a are jointly sufficient for m and a. If a is known 
X is sufficient for m, but if m is known a is not sufficient for a. The two are not completely 
sufficient. 


17.48. The properties of sufficient estimators may be proved true, with certain 
modifications, for several parameters, but we shall not take the subject further except 
to quote one result. 

If f {Xf di . . . dp) is continuous and not zero over some continuous range of the 0’s, 
and ^ exists, then it is necessary and sufficient for the existence of a set of jointly sufficient 
estimators that 


/ = exp I ^ 


I k^l 


+ B + Y 


(17.126) 


where Ai^ and B are arbitrary functions of the 0*s and and Y of x. (See Koopman, 1936.) 


Example 17.21 

* The Type III distribution of Example 17.19 gives us 

log/ = ~ p log -- log r (p) + {p — 1) log {x — ct) — - — -. 

o 

If a is regarded as known, this may be put in the form 

- * + (p - 1) log (a: - a) — p log ff - log V (p), 

(7 

which is of type (17.126) with 

Ai = i, Xi = X ~ X 

a 

Aj = p — 1, Za = log (x — a) 

B == - p log u - log r (p). 

Thus if a is known, there are sufficient estimators for a and p jointly. It will be clear on 
inspection that if a is unknown there are no sufficient estimators, even if a and p are known. 


Parameters of Location and Scale 

17.49. Consider a frequency function expressed in the form 

(?--_«) .... (17.127) 

The parameter a may be regarded as locating the distribution and P as determining its 
scale. In particular the normal distribution may be put in this form. We may write 

df = exp ^ (f ) df = exp ^ (f ) $, . . . (17.128) 

P 

S = and if> (f) = log g (S). 


where 
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In samples of n we have 

logL — I — n log 

giving for the maximum likelihood estimators 


dlogL _ 1 

whence we may solve for & and /?• 

For the variances and covariance we find 




(17.129) 

(17.130) 














aiog/ a tog/' 

dx 'df~, 


and the Hessian of (17.116) becomes 




(17.131) 


from which the variances and covariance of & and /9 may be determined in the usual way. 

In (17.131) it would be a great convenience if the quantity — E vanished, for 

then a and /? would be independent. By a suitable choice of origin we can, in fact, ensure 


that this is so. Put 


Then 


: = 1 _ 


E (^' f) 

E{V) ■ 


. (17.132) 




= E (;>’ + f^'), 

so that 

E (^’' H) = 0. 

With this origin we have for the variances of the (uncorrelated) variables & and 


nE (^') 




The point of location so defined, namely, as that for which & and ^ are uncorrelated, has 
been called by Fisher the centre of location. 
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Example. 17.22 

For the normal distribution 


dF — -pT-rr^ V exp 
^y/{2n) 




we have ^ = — if* 

= - 1 and E {(f," i) =0. 

Hence C = f , and the origin chosen is itself the centre of location. From (17.133) and 
(17.134) we find the familiar results (for large samples) 

A - 

var a = var x = — 


var /5 = var s = 


with X and a uncorrelated. 


Example 17.23 

Consider again the Type III distribution 

where we assume p known. The condition /> > 1 is required to ensure the vanishing of 
the frequency frmction at the extremity « = a, and p > 2 to ensure the convergence of 
some of the mean values. 

Here 

<l> = constant — f + (p — 1) log f. 

Hence 

E(f* ^')=E(-p + l) = -(p-1). 

Thus 

C = f - (p - 2). 

The centre of location is distant (p — 2) to the right of the start of the distribution. In 
terms of C we have 

<f> = constant — f — (p — 2) + (p — 1) log (f + /» — 2) 

^ ^C+p-2 ^ (^+p_2)* 

E(^'') = -l/(p-2) 

E (^* f* - 1) = - 2. 


var a =^>-2) 

n 


Hence 
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Efficiency of the Method of Moments 

17.50. In previous chapters we have fitted distributions of the Pearson type to 
other distributions by identifying lower moments. We were there mainly concerned with 
the properties of populations only and no question of the reliability of estimates arose. 
If, however, we regard the data as a samph from a population, the question arises whether 
fitting by moments provides the most efficient estimators of the unknown parameters. 
As we shall see presently, in general it does not. 

Consider a parent form dependent on four parameters. If the maximum likelihood 
estimators of these parameters are to be obtained in terms of linear ftmctions of the moments 
(as in the fitting of Pearson curves), we must have 

= a, + at r (*) + a* i: (a:*) + o, T (x») + ^ (**) • (17.135) 

oo 

and consequently 

f(x, 01, 02, 03, O4) = ©xp {60 + bi X + bt + bs + 64 x*}, . (17.136) 

where the 6 ’s depend on the 0’s. This is the most general form for which the method of 
moments gives maximum likelihood estimators. The b*a are, of course, conditioned by 
the fact that the total frequency shall be unity and the distribution function converge. 

Without loss of generality we may take bi = 0. If, then, the other 6*s vanish except 
60 and 6 a the distribution is normal and the method of moments is most-efficient. In 
other cases, (17.136) does not yield a Pearson distribution except as an approximation. 
For example, 

= 262 a; -f- 363 a;* f- 464 a*®, 
ax 


If 63 and 64 are small this is approximately 


dlog/_ 26a a: 

dx , 363 264 „ 


(17.137) 


which is one form of the equation defining Pearson distributions (cf. 6.2). Only when 
63 and 64 are small compared with 62 can we expect the method of moments to give estimates 
of high efficiency. 


17.51. A detailed discussion of the efficiency of moments in determining the para- 
meters of a Pearson distribution has been given by Fisher (1921a). We will here quote 
only one of the results by way of illustration. 

We found in Example 17.19 that the variance for large samples of the maximum 
likelihood estimator p is given by 


varp 


or, if p = p — 1 , by 


7 (i®Jog/>) ^ 1_ \ 


varp = ~ 


w I 2 


dp^ 


p 


. (17.138) 
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Now for large 

^log/’a +i>) = ||iog ^ (p + \no%p - p + ^ 

We then find 

2 — log /■ (1 + p) — - + i = ~ I— ^- + 

p p* 3 6j)» ^ 72»» 

and hence, approximately, 

0 

var p = ~ (p» + Ip). . 


360p» 126fl5>» 


...} 


. (17.139) 


If we estimate the parameters by equating sample-moments to the appropriate moments 
in terms of parameters, we find 

a -f* op == Ttix 


so that, whatever a and a may be, 


a^p = iWj 
2por® = m* 




ml 


(17.140) 


where 6, is the sample value of Now for estimation by the method of moments (cf. 

9 . 22 ), 


var bi — (4)94 — 24/9* + 36 + 

n 


?1 ^2 12^8 “f" 

which for the present distribution is readily seen to reduce to 

var 


^ 6(p + l)(p+5) 

^ n' p 

Hence, from (17.140) we have for p, estimated by the method of moments, 


. (17.141) 


varp 




16 


var 6i 


==^/>(P + !)(/> + 6)* 
n 

For large p the efficiency of this estimator is then, from (17.139) with p = 1 +p, 


E = 


p* + Ip 


(P + l)(p +“2)"(P+15)’ 

which is evidently short of unity in many cases. When p exceeds 38* 1 ()9i === 0'102) the 
efficiency is over 80 per cent. For p = 19 (/9i = 0-20) it is 66 per cent. For p = 4 a more 

exact calculation based on the tables of the trigamma function shows 

that the efficiency is only 22 per cent. 


* The series for the log P function is given in most books on advanced calculvis, e.g. J. Edwards^ 
IrUegral Caloulue, vol. 2, article 942. 
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NOTES AND REFERENCES 

The greater part of this chapter is based on the researches of R. A. Fisher, the main 
papers being those of 1921a, 19256 and 1934a. The idea of maximising likelihood may 
be traced back to Gauss and was considered by Edgeworth, but may be regarded as begin- 
ning to exercise an influence on statistical theory only with the publication of Fisher’s 
first paper in 1912. 

The theorem giving the limiting variances and covariances of maximum likelihood 
estimates was proved (incorrectly) by Karl Pearson and Filon in 1898 before it was realised 
that it applied only to maximum likelihood. The necessary correction was given by Edge- 
worth (1908) and Fisher (1921a), but rigorous proofs were not available until the work of 
Hotelling (1930) and Doob (1934a and 6, 1935, 1936). Tn the text we have followed 
Hotelling’s treatment. 

The inefficiency of moments in fitting distributions, pointed out by Fisher (1921a), 
has led to some controversy, for which see Koshal (1933, 1935), Myers (1934), Elderton 
and Hansmann (1934), K. Pearson (1936), and Fisher (1937a). The reader who pursues 
this subject so far as to read any one of these papers should read them all. 

For work on sufficient estimators see Koopman (1936) and Pitman (1936, 19376), who 
independently obtained the general form of distribution admitting such estimators. The 
theorem that sufficient estimators have the jiroperty 17J7 is due to Fisher, rigorous proofs 
being provided by Neyman (1935a) and Dugue (193fia). Reference should also be made 
to papers by Bartlett (1936a, 6, 1937c, 19386, 1939a, 1940) on the problem of several para- 
meters and what he calls “ conditional ” statistics, i.e. those similar to when x or some 
other function of the sample values is regarded as known. See also Neyman and Pearson 
(1936a). 

Among recent papers, that by Pitman (1939a) on parameters of scale and location, 
and that by Welch (1939c) on the distribution of maximum likelihood estimates, are 
noteworthy. 

Geary (1942) has recently proved a remarkable generalisation of the theorem that 
in large samples maximum-likelihood estimators have minimum variance in the case of 
one parameter. In fact, for several parameters the maximum likelihood estimators 
minimise the generalised variance ” as defined in Chapter 28. 


EXERCISES 


If < is a most-efficient estimator and V a less-efficient estimator with efficiency 
E, ahd^i the correlation of t and t' is p, show by considering the estimator t" defined by 
(1 +E -2p VJS) (1 - P VE) t + (E ~p -v/A’lt' 
that p = y/E (for in the contrary case var varf)- 

(Fisher, 19256.) 


17.2. If in n trials of an event with probability p there are x successes, show that 
a wiftvimiitn likelihood estimator of p is x/n. Find its sampling variance and show that 
it is sufficient. 

17.3. Show that the distribution 

dF = i exp {— I a: — 0 I } dar, 


— 00 < a; < 00 
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has a likelihood function for a sample of n which is a maximum at the median if n is odd 
and between the (w/ 2 )th and (n /2 + l)th members if n is even. 


17.4. For the distribution of the previous exercise show that for a sample of (2m + 1 ) 
members the median has an accuracy 

(m + 1 ) J 2 m + 1 ) f- ( 2 m) ! 1 

(m — 1 ) \ (m 1)2 j 

Hence, as m tends to infinity, the loss of information tends to 4 V(m/jr) — 4. Thus, 
although the median is most-efiicient the loss of information in large samples does not 
tend to a constant. 

(Fisher, 19256.) 

17.5. Show that if a most-efiicient estimator A and a less-efficient estimator B tend 
to j 6 int normality for large samples, B —■ A tends to zero correlation with A, 

Show that the error in B may be regarded as composed (for large samples) of two 
parts which are independent, the error in A and the error in JB — A. (The first may be 
regarded as sampling error, necessarily inherent in the problem of estimation, the second 
as error due to the inefficiency of the estimator.) 

(Fisher, 19256.) 


17.6. Show that the distribution of the median in a sample of ( 2 m + 1 ) observations 
from the population 


dF 


dx 


ni -\-(x - 0 ) 2 ’ 


— 00 Kx < 00 


is given by 


(m !) 2 : 7 r 2 w+i ^4 ^ / 1 


dx 

TV- 0)*’ 


where tan ^ = a? — 0 and | ^ | < Jjt. 

Show hence that the accuracy of the median is 


2 

^ , 3m (^ -1- 1) , (wt + i) ! / 2 \*»+* 

2 “I ft /jjr’ ’i\ L.9. ‘ 


2 (m — 1 ) 


2m 


1 ' (!)”** w} 




2m -f- 3 


'w+t 


(27t) 


} 


where. J„ (z) is the Bessel function of order n and in particular {n) = (2jt) = 0 , 

J, (n) = J. (2n) = — and v 

71 * ^ 


71 


2n 


'n+l 




(Fisher, 19256.) 
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17«7. Show that the most general continuous distribution for which the maximum' 
likelihood estimator of a parameter 0 is the geometric mean of the sample is 

f(x, 6) = exp {ip (0) + f (*) }, 

where ip is an arbitrary function of 0, and f of ar. Show further that the corresponding 
distribution giving the harmonic mean is 

(Keynes, J.R.8.8. (1911), 74, 323.) 

17.8. Show that, if m is known, the estimator 

s — X (a: — wi)® j-* 
is sufficient for a in samples of n from 

and find its distribution by the method of 17.31. 

17.9. By considering the distribution 

dF — dx, 0 < a? < 00 

show that the three forms of (17.97) arc not necessarily equivalent when the range contains 
the parameter to be estimated. 

(Pitman, 1936.) 


17.10. Show that if the frequency function is continuous and is zero at an extreme 
which is a function of 0, there still exists a maximum to the intrinsic accuracy, defined 

(Pitman, 1936.) 


17.11. By considering the distribution 

= 0<a;<0 + l 


show that the intrinsic accuracy is 4»*/(20 + 1)*. Show further that the largest member 
of the sample is sufficient for 6 and that its distribution is 

2nx (a:® — 0*)"-i 


dF — a (x) dx — 


(20 + 1 )" 


dx. 


Hence show that 

/ a log aV _ 4»* (0 + 1)* 4»0 » 

) (20 + 1 )» ■*■(»- 2 ) (20 + 1 )*’ 

so that the mean value in this case is greater than the intrinsic accuracy. 

(Pitman, .1936.) 
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f 1 /30\*1 

17.12. If the frequency function of an estimator < is its accuracy isjE<^f*^j >. 
If every possible sample with frequency ^ gave a different value of t the accuracy would 
be j- and would be independent of t. Show that the difference in accuracy 

may be expressed as 




d_^ 

30* 


l^\ 
0 30 / 


} 


and hence is not negative. 

Hence show that the efficiency as defined in 17.36 cannot exceed unity, at least if the 
range is independent of 0. . 


(Fisher, 19256.) 


17.13. Show that 


dF = 


02 dx 


00 < a; < 00 


jr0i + (:c-0O^’ 

does not admit of a sufficient estimator for either parameter if the other is known, or 
a pair of jointly sufficient estimators if both are unknown. 

(Koopman, 1936.) 

17.14. Show that if a distribution admits a sufficient estimator for either of two 
parameters when the other is known, it admits of a pair of jointly sufficient estimators 
when both parameters are unknown. 

(Koopman, 1936.) 

17.15. Show that the centre of location of the Type IV distribution 

p-f-2 

CO <X < 00 

to the left of the mode of the distribution. 


dF oc c 

where v and p are assumed known, is distant 


vp 


■! 4 


(Fisher, 1921a.) 


17.16. For the distribution 


01- I’ <0x+| 


show that, in large samples, the mean tends to the form 

Show further that the distribution of the centre of the sample, say c (the mean of the two 
extreme values), tends to 


Hence 


var c 6 


vara; 

so that the centre is a far better estimator of location than the mean for this distribution. 

(Fisher, 1921a.) 
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17.17. Show that for the Type I distribution 

dF = — r (1 — x)*“^ dx, 0 < X < 1 

" ?) 

the geometric mean of the sample values x and that of the values (1 — x) are jointly 
sufficient for the estimation of p and q. 

17.18. Show that all the Pearson distributions have sufficient estimators for some 
of the parameters if the others are assumed known, and ascertain which are the parameters 
concerned for each type. 

17.19. For the distribution of Exercise 17.16 show that the intrinsic accuracy for a is 

I ( p + l)(p + 2) ip + 4) 
ip + 

and that the efficiency of the method of moments in locating the curve is 

p»(p-l){(p + 4r+vM 
(p f 1) (p + 2) (p + 4) (p2 

(Fisher, 1921a.) 


A.S.— n 



CHAPTER 18 

ESTIMATION: MISCELLANEOUS METHODS 

Minimum Variance 

18 . 1 . We have seen in the previous chapter that under certain general conditions 
the maximum likelihood estimator is most-efficient for large samples, and that for finite 
samples it leads to sufficient estimators where such exist. Sufficient estimators themselves 
contain all the information in the sample about the parameter under estimate. What 
we have not shown, however, is that maximum likelihood estimators have minimum variance 
in finite samples. 

We now consider the subject from a slightly different standpoint. Instead of begin- 
ning with the criteria of efficiency and sufficiency and showing that they lead to certain 
minimal properties, we shall examine the class of estimators which (a) are unbiassed and 
(6) have minimum variance. The minimal property is here taken as the starting-point. 


18 . 2 . Consider, then, a frequency function f (x, 0), and as usual let us write 
L = / {xi, 0) ... f (x^, d). Then, writing I dx for the n-fold integral over the range 
of the a:’s, we have to find t ==t (x^, . . . x^) such that 



tLdx 0 . 



0)2 Ldx ^ minimum. 


(18.1) 

(18.2) 


The first equation may also be written 

f (« - 0) i/ da; = 0 (18.3) 

J -oo 

The problem of finding t is one of the familiar problems in the Calculus of Variations. The 
minimal value of (18.2) has to be found subject to the condition (18.1), which is equivalent to 


= l (18.4> 


provided that the range of / is independent of 0 or that / vanishes at any extreme which 
depends on 0. 

If 2A is an unspecified parameter (which may depend on 0 but not on the x's) the 
problem is equivalent to finding an unconditioned minimum of 

j* ' .... (18.6> 

The solution is"** 


* See, for example, J. Edwards, Integral Calculus, vol. 2, article 1504, or A. R., Forsyth, Calculus 

dt 

of Variations, article 15. Since the expression to be minimised does not contain the Euler equation 

ox 

f dV 

for a stationary value to the integr^ I F da? reduces to ~ ~ 0. The derivation of (18.7) is not,. 
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or 

We then have 


(«-0)i-A^ = 0. 


< = 0 +- 


de 




(18.6) 


(18.7) 


where Hs a function of the a?’s but not of 0, 
9 log _ 

if we can express — in the form 


d log L 

" w~ 


Thus there exists a t satisfying our conditions 



(18.8) 


This is a necessary and sufficient condition, except that it gives only stationary values of 
(18.2) which might, for instance, be maxima instead of minima. This is not a point, 
however, which need detain us from the statistical viewpoint, troublesome as it is to the 
mathematician. 


Example 18.1 

To estimate 8 in the normal population 

= av{^) {“ 2^^ - oo< X < 00 

where a is assumed known. 

We have 

dO <y2 ^ ^ 

This can be put in the form (18.8) by taking 

. . . - or* 

X ~ t and A — — , 

n 

and hence x is the required estimator. Wc note that it has minimum variance for any 
n in the class of unbiassed estimators of 0. 


Example 18.2 

To estimate 0 in 


dF 


dx 


n 1 4- {x — 9)*' 


— - 00 < a; < 00. 


We have 

djo^ = 2 r / ^ ^ \ 

do \ 1 + (X - 0)» j ■ 

This cannot be put in the form (18.8) and the method fails. There is no estimator which 
is unbiassed and has minimum variance. 


however, without its difficulties, and I think some conditions have been cuscidentally suppressed in 
the Aitken-Silverstone method. I understand that Dr. Leon Solomon, working with Dr. Aitken, has 
obtained a proof which depends on the fact that L shall be the product of n independent frequency 
functions. But for the war the point would doubtless have been cleared up by now, but at present 
it remains open. 
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18.3. Integrating (18.8) with respect to 6 we have 

logL = a (0) (< - 0) + ^ (fl) + (*#). 

1 

where a, P, y are arbitrary fimotions (apart from the fact that the two former depend on 
A). Hence , 

log/ (*, 0) = A (0) (< - 0) + 5 (0) + C (*) 

= p (0) t (x) + g (0) + r (*), say. . . (18.9) 

Comparing this with (17.83), we see that the method of miniTiniiTn variance will give a 
solution only if there exists a sufficient estimator. This explains the success of the method 
in Example 18.1 (where x is sufficient) and its failure in Example 18.2 (where no sufficient 
estimator exists). 


18.4. In the method of maximum likelihood it makes no difference to the final 
result whether we estimate for a parameter 0 or for some other parameter x functionally 
related to 0. For 

3 log L _d log L dx 

“30 ^ m 

and the two sides of the equation vanish together. In the method of minimum variance, 
however, there is an interesting difference. 

Suppose we wish to estimate 0 in ' 

We have 

d log L __ n I £ (a;2) 

~de ‘20 

and this may be put in the form (18.8) vdth 

and . 

n n 

If, however, we consider the parallel problem of estimating a in 
1 /I x^\ 

dF = — exp ( — - —5 I dx, — oo < a: < cx) * 
a^/(27z) ^ \ 2 orV 

we find 

d log jL __ n £ (x^) 

which cannot be put in the form (18.8). We thus reach the peculiar result that the method 
will provide an estimator for but not for a. It follows that in general we may have 
to estimate, not 0 itself, but some function of 0, say r (0). 


18.5. If a minimum-variance estimator exists for some t (0) we 

dlogL _t — ^ 

which is equivalent to 

31ogL _30^* 

30 A (0) ■ ' 



(18.10) 
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We estimate t by putting it equal to r and thus we shall have, for the estimator, 



. (18.11) 


This is equivalent to the equation of maximum likelihood. The two are not, however, 
identical. Maximum likelihood is not concerned with the existence of the function A. 
Minimum variance takes the function as fundamental, and when it exists the solution 
(which is the same as the maximum likelihood solution) has minimum variance for all n 
in the class of unbiassed estimators, not merely for large n. 


18.6. Let us suppose that 0 is the parameter (transformed if necessary) for which 
the estimating function is 0 itself. Then we have for the minimum>varianoe estimator t 

var f = [ (t -- Oy L dx, 

J —QO 

which, on substitution from (18.8), yields 

var 


-L-W)'** 


. (18.12) 
. (18.13) 


if the range is independent of 0 or / vanishes at any extreme dependent on B. 
Now from (18.8) we find 


30 * 


rto 

and hence, substituting in (18.13) and remembering that J (^ — 0)Ldx = 0, we find 

_ 1. (18.14) 

The variance of the minimum-variance estimator is thus simply the parameter A. It also 
follows from (18.13) that 




(18.16) 


so that the result we reached in Chapter 17, as a limiting form for large n, is now seen to 
be exact for finite n under present conditions. 

i,. 

Emmple 18.3 

To estimate 0 in the Type III form 

1 


dF = 


where p is assumed known. 


r{p)e<‘ 


^~t g-xl» 


0 <a; < 00 , p > 1, 
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We have 


which is of the form (18.8) if 


d log L _ np ^ nx 

“ T T* 


X 

t = - 

p 


C?y 


Thus t is the minimum-variance estimator and has variance — for finite n, even though 

up 

the distribution is not normal. (Compare Example 17.8.) 


18.7. We may readily determine what function t (6) should be taken as the estimating 
function. Taking the general form from (18.9), 

log/ (x, 9) =p (9) t(x) +q (9) + r (x), 


we have 


Hence, if 


we have 


logL == p£f (x) + nq f £ r (x) 

T = - - ?£/?? 

dp dd de ‘ 


d log L _n 

dr 1/3® 

l/n £ 


which is of the required form provided that 


1 dp 
A “ 


(18.16) 


(18.17) 


(18.18) 


(18.19) 


Example 18.4 

Consider again the estimation of a in 
1 /la 


dF = exp 

V(2jr<r*) ^ 




00 <a; < 00. 


log/ = - J log (in) - log <r - i — , 


whence 


P(o)’= - t (x) ^X*, 9 = - log or. 


Thus the appropriate value of r, from (18.17), is 


£9 I dp 

doj da 


'S 



MINIMUM 

which is thus determined as our estimating function. For the variance of the estimator 
of T we have 




the estimator itself being ^ I, (a;*). 


2 ( 7 ^ 
n ’ 


Minimum x^ 


18. 8, We now turn to consider another principle which has been suggested for pro- 
viding estimators. If the data are grouped into cells with expected frequency typified 
by Ay and observed frequency by /y, then the function 







. (18.20) 


where n E (A,) E (1^) (18.21) 

can, as we saw in Chapter 12, be used as a measure of closeness of fit. The method of 
minimum x^ adopts this standpoint (which is, of course, arbitrary in the logical sense) 
and attempts to determine the parameters A such that is a minimum. 

In practice the method is not very easy to apply because of the difficulty of expressing 
the A’s in terms of the parameter under estimate, 0. For some illustrations reference 
may be made to Kirstine Smith (1916). We shall not consider the method at length 
here for two reasons : — 


(а) it may be shown that for large samples the minimum- estimator tends to 
the maximum-likelihood estimator ; 

(б) there is a modification of the method, considered below, which is much easier 
to apply. 


18.9. For samples of fixed size n the distribution of the quantities Jy is multinomial, 
and we have for the likelihood function 


n{h\) \n) \ij- 


n 

i 


Thus 


. (18.22) 


(18.23) 


log // = constant + 2" Z, log ^ ^ y . 

Now for large samples we may put 

^ == Z, + o, «*, 

where o, is finite and therefore small compared with ^ ; 1 a, n* ] < Z, ; and E (o,) = 0. 
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benoe, from (18.23), 


logL — k + ZL 




= ifc + 

h 

^k-^Z^^i-ZM + Oin-i). 


(18.24) 


Now write 


x'^ = 

h 


=.Z^1- 


l. 


n. 


(18.26) 


Then we see that, to order nr*, L is maximised by minimising x'*- This latter quantity 
is not the same as x^ because the denominator terms are Va instead of ^’s. However, for 
large n the difference is of order n~^ for 

= O (n~*). 

Hence, to order n~* the estimates obtained by minimising either ;u* or will be equivalent 
to maximising L. 


18 . 10 . The advantage of using instead of x' practice resides in the fact that 
the denominators in the former are integral. However, if there are any empty cells (i.e. 
those for which Ij — 0) the formula (18.26) requires some modification. 

In the likelihood function, if Zy == 0, ^ for all Xf. The substitution 

Xj = If Of n* 

win give us, for the empty cells, a term in (18.24) equal to — Z OfU* = — Z Xf = M, 
say. Hence we have 

X>t ^ + 2M (18.26) 

h 

where the summation takes place over occupied cells and ilf is the sum of the theoretical 
firequencies X in the empty cells. 


Example, 18.5 

As an example (Jeffreys, 1941) we consider a case where the maximum likelihood 
estimator is known, so that a comparison may be made with the result given by 
minimum x'*- 

Col. (2) of the following table shows the frequency of women in the first class of Part II 
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of the Mathematical Tripos from 1910 to 1938 inclusive. 


follows the Poisson distribution 




to estimate 0. 


Assuming that this distribution * 


(1) 

(2) 


(3) 



(4) 


Number of 
firsts, j 

Frequency 

h 

0=1 

h 

B = 1-5 

0=2 

1 

0=1 

0 = 1-5 

0 = 2 

0 

6 

10-7 

6-6 

3-9 

3-7 

0*0 

0-7 

1 

8 

10-7 

9-7 

7-9 

0*9 

0-4 

00 

2 

11 

5-3 1 

i 7-3 

7-9 

30 

1-2 

0-9 

3 

3 

18 ! 

' 3-6 

6-2 

or> 

01 

1 1-6 

4 

0 

; 0-5 

; 1-4 

2-6 

— 




5 

1 

01 

: 0-4 

10 

0-8 

0-4 

00 

over 6 

, 0 

1 

00 

0 1 

0-5 

2M = 10 

i 2M = 3 0 

2M = 6*2 

Totals 

, 29 



; 

1 

9-9 

51 

i 

9-4 


The sample mean (a sufficient estimator of 0) is in this case 44/29 = 1*52 with a standard 

error /- = 0-23. 

% 

To apply minimum we have to express the theoretical frequencies in terms of 0. 
This results in an unmanageable equation if we then substitute in Instead we cal- 
culate the minimum by finding for some trial values of 0 (in this case 1, 1*5 and 2) and 
then interpolating. 

The expectations A for the three selected values of 0 are shown in column (3) of the 
table and the corresponding in column (4). It is found that, writing 0 = 1*6 + 
the values of may be represented by the quadratic 

= 61 - 0-5if> + lS-2<f>^, 

The minimum of this is given by ^ = 0 01, and hence our estimate of 0 is 1*51, very close 
to the value of 1-52 given by the maximum likelihood estimator. 

18.11. On theoretical grounds there seems no reason to use minimum instead of 
TnAYimnm likelihood. The method has some practical value, however, where the maxi- 
mum likelihood equations are difficult to solve. We can usually follow the device of the 
example just given, find y^ or y'^ for some trial values of the parameter, and approximate 
to the value which minimises y^ or y^. Whether this is easier than finding the maximum 
likelihood estimate in the same sort of way depends on the circumstances of the case, but 
it may well be so when the frequency function is a tabulated integral, so that expected 
firequencies for specified parameter-values can be readily obtained. 

18.12. In the manner of 17.39 we can estimate the loss of information occasioned 
by the use of mmimum yK We have, for the minimum of y\ 

d 



58 


ESTIMATION; MISCELLANEOUS METHODS 


whioh reduces to 


^ dd 


. ( 18 . 27 ) 


Since 


l + X 


tends to the constant value 2 for large samples, this is equivalent to the 


maximum likelihood equation 


/ dd 


. ( 18 . 28 ) 


confirming that maximum likelihood and minimum same results in the limit. 

Since 

Z* - A* = 2A (Z - A) + (Z - A)« 
the deviation of - —f — from its 




mean is 

Z^ - A2 dX 
A2 ' dB 




(Z - A)2 dX 
A2 dO' 


. ( 18 . 29 ) 


the first term vanishing on summation. As in 17.39 we find the variance of this quantity 

d log L 

within samples for which — 

oB 


is constant. We have 


var r ifc (Z - A)2 = 2 r (k^X'^) - - [kX^) 

n 


^ 0 ' ‘2 V * 




X 


(t) 


and on substituting k = we find 


272 


27 * 


(xO, 


( 18 . 30 ) 


giving the loss of information. 

As the sample size increases, this quantity remains finite. It is interesting to observe, 
however, that as the number of classes increases it also increases without limit, indicating 
that minimum breaks down for fine grouping. 


“ Inverse ” Probability 

18.13. According to Bayes’ theorem (7.24), if h (6) dB is the prior probability of 0, 
the posterior probability is given by 

P(B\x^, ... x^)^L (xi, . . . x^,B)h (B) dS . . . (18.31) 

It is then easy to determine the ** most probable ” value of B by maximising L A (0) if we 
know h (0). The principles of inference with which we have been concerned up to the 
present do not require the notion of the probability of 0 and, even if they did, would not 
give any guide to the nature of the function h (0). In fact, to an adherent of the frequency 
theory of probability, the prior probability of 0 requires the distribution of 0 in some form, 
and if 0 is merely an unknown constant it has no distribution (except the trivial one that 
/ «= 1 when 0 takes its true value and / = 0 elsewhere). The alternative school of thought 
assumes the existence of h (0) as denoting a prior measure of belief, but, in order to find 



LEAST SQUARES 


59 


the most probable value of 0, has to make some further assumption as to its values com-* 
parable to Bayes* postulate that for a finite range A is a constant. 

We have already noted that on this assumption the maximisation of L is equivalent 
to finding the value of 0 with the greatest posterior probability. It is also interesting to 
note that, whatever the form of h (0), maximum likelihood tends to give the same estimator 
as the method of maximising posterior probability for large w. In fact, for the maximisation 
of P in (18.31) we have 


d log P _ 3 log L d log h 

__ f gg— 


(18.32) 


d lo -/y 

In ordinary cases the variance of is of order n, whereas the second term is inde- 

ou 

pendent of n. In the limit, therefore, the second term is negligible and we are reduced to 
the likelihood equation 

dO 


Least Squares 

18.14. The method of least squares bears an analogy to minimum Suppose 
we have an expression depending on a number of unknown parameters 0i . . . 0,, and 
certain observed values x. This can be thrown into a form such as 

k{x,e^ . . . 0,J =0, .... (18.33) 

where k is a given function (not a frequency function). If we have n values of x and n > p 
it is not possible to solve the n resulting equations of type (18.33) for the 0*8. We then 
consider the ‘‘ residuals ’* k {x^ Oj . . . 0^,), and the principle of least squares states that 
the values of 0^ . . . 0^ arc to be chosen so that 

Z (jfc (x^, 0i . . . Of,) }* = minimum, . . . (18.34) 

or, in other words, so as to satisfy the p equations 

Ox ... 0,,)} = 0, l^l ... p. . . (18.35) 

18.15. Consider the case when the residuals are all distributed normally with variance 

The logarithm of the likelihood is then (except for constants) — 

logL = -»log<x - ... e„) . . . (18.36) 

and this is clearly maximised by minimising the sum (1 8.34). In this case, then, the method 
of least squares is equivalent to the method of maximum likelihood. In other cases it 
may give different results, and the justification for using it then becomes more or less 
empirical. 

18.16. The most important case occurring in statistical theory of the use of the 
method of least squares concerns regression equations. We have already seen that the 
coefficients of regression are, in effect, determined so as to minimise the sum of squares of 
residuals (cf. 15.2). We also know that, for the multiple normal distribution, residuals 
from the population regression lines are, in fact, normally distributed (15.13). For formal 



60 


ESTIMATION; MISCELLANEOUS METHODS 


Variation, therefore, the method of least squares is equivalent to maximum likelihood so 
far as concerns the simultaneous estimation of regression coefficients. 

18.17. This is a convenient point to prove a theorem (due to Gauss) which in one 
form or another is constantly occurring in statistical theory, particularly in connection 
with the normal distribution. Suppose we have a population (not necessarily normal) 
in which the regression of one variate y on the others Xo (=1), Xt . . . , is given by 

y = ^0 + *1 + • • • + ®j>* • (18.37) 

The «’s may be correlated among themselves and, in the extreme case, functionally related, 
so that this case includes that of curvilinear regression for our present purposes. Suppose 
that we have a sample of n values, where n> p. Denoting by £ summation over these 
n values, we determine the estimates of the /3’s by minimising the sum of squares, e.g. 

£{y — — PiXi — . . . — Pp Xp)K 

Suppose that ba ... bp are the solutions of this process. Then our regression formula is 

y — ba — bi — ... — bp Xp = 0. . . . (18.38) 

The observed residuals, obtained by substituting the observed values in this equation, 
are typified by 

e = y -ba — btXi . . . —bpXp, . . . (18.39) 

whereas the “ real ” residuals are typified by 

e = y - pa — PiXi . . . — ppXp. . . . (18.40) 

We proceed to compare the sampling variances of e and e and to show that 

var e = var e, .... (18.41) 

n p — I 

provided that the residuals are uncorrelated. 

Let us transform the observed values of the x’s to new values li • • • Ip (^ for 
each) such that 

^ h) == !• j = *'1 

= 0 (18.42) 

J 

This involves, for each |, p + 1 equations in n unknowns and is therefore possible in general. 
We then have 

Ifc (® ®) ffc { ^o) + (Pi i*l) *1 + • • • iPp ftp) Xp } 

— Pk ~ ft*- 

But rffc c = r (ffc y) - { ft, + 6i Xx + . . .bp Xp} 

— ft* ~ ft* = 0- 

Hence j?* — 6* = — Z f*. e. . . . (18.43) 

Now - £ eie - e) = £ {y -ba - . . . - ftp Xp }{(/?,- 6,) -f . . . (Pp — bp) Xp) 

= 0 , 

since the summations give terms the vanishing of which determines the ft’s. Hence 

£ e* — £ = £ (e ~ e) e 

= 8{bf - pf)£xfe, 
i 
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where S denotes summation over the {p + 1) values of j, 

= 8 H 

= S {Z + cross-product terms in e, 

= 8 + cross-product terms. 

When we take expectations the cross-product terms vanish since the residuals are uncorre- 
lated. Hence 

E{Ze^) -E(8e^) ^ E E e\ 

or (n — p — 1) var e = n var e, . . . (18.44) 

from which (18.41) follows at once. 

For normal variation we shall consider this result from a slightly different viewpoint 
in Chapter 22. 


NOTES AND REFERENCES 

The approach to minimum-variance estimators through the Calculus of Variations is 
due to Aitken and Silverstone (1942). For minimum Smith (1916) and R. A. 

Fisher (1922a, 19256). For the modification x'^ see Jeffreys (19386, 19396, 1941). 

A method of estimation essentially depending on the median has been proposed for 
use in quality control, but its value is as yet problematical. For an account of the technique 
see Simon (1941). 


EXERCISES 

18.1. From the property that the variance of a minimum- variance estimator is 
equal to A show that the most general distribution for which the sample mean is a sufficient 
estimator is 

/ (a:, 0) = c (x, a) exp | - (x - 0)* |, 

where c is an arbitrary function and or^ is the variance of /. 

Hence show that no Pearson curve other than the normal admits the sample-mean 
as a sufficient estimator, but that a Gram-Charlier series may do so. 

(Aitken and Silverstone, 1942.) 

18.2. If the function A exists and 



show that the variance of the estimator t is 

_ 1 
n 3a®’ 

where q is the function of 18.7. (Aitken and Silverstone, 1942.) 

V 

18.3. If a population (p h q)* is regarded, as distributed in 5 classes, show that the 

intrinsic accuracy is — . Show farther that the loss of information through estimating 
pq 

p from minimum is 

(3jp* -2pq + 3g*) - ^P* " 22)»? + 18p*g* - 2pq^ + q*y. 

T hin is least when p —q and is then equivalent to the loss of 6 observations. 

(Fisher, 19266.) 
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CONFIDENCE INTERVALS 

19.1. In the previous two chapters we have been concerned with methods which 
will provide an estimate of the value of one or more unknown parameters ; and the methods 
gave functions of the sample values — the estimators — which, for any given sample, pro- 
vided a unique estimate. It was of course fully recognised that the estimate might differ 
from the parameter in any particular case, and hence that there was a margin of uncer- 
tainty. The extent of this uncertainty was expressed in terms of the sampling variance 
of the estimator. With the somewhat intuitional approach which has served our purpose 
up to this point, we say that it is probable that 6 lies in the range t ± y/ var t, very probable 
that it lies in the range t ±2y/ var t, and so on. In short, what we have done is in effect 
to locate 0 in a range and not at a particular point, although we have regarded one point 
in the range, viz. t itself, as having a claim to be considered as the “ best ” estimate of 6. 

19.2. In the present chapter we shall examine the logic of this procedure more 
closely and look at the problem of estimation from a different point of view. We now 
abandon attempts to estimate 0 by a function which, for a specified sample, gives a unique 
number. Instead we shall consider merely the specification of a range in which 0 lies. 
We shall not attempt to specify whereabouts in the interval the value of 0 really is ; all 
values in the range have an equal claim to be taken as the “ true value. Nor shall we 
assess the probability that 0 lies in the interval in the sense that 0 is regarded as a random 
variable. In fact, in the frequency theory of probability 0 is not a random variable (except 
trivially in that the frequency of 0 is unity when it takes the true value and is zero else- 
where). Nevertheless, probability plays an essential part in the determination of the 
interval and in the degree of confidence we have that it “ covers ” 0. 


Case of one Unknown Parameter 

19.3. Consider in the first place a population dependent on a single unknown para- 
meter 0 and suppose that we are given a random sample of n values from the 

population. Let z be a statistic dependent on the x'e and on 0, whose sampling distribution 
is independent of 0. (The examples given below will show that in some cases at least such 
a statistic may be found.) Then, given any probability a, we can find a value Zi such that 

[ dF (z) = a, # 

J —00 

and this is true whatever the value of 0. In the notation oi the theory of probability we 
shall then have 

P (8 < 1 0) = a. . . , . (19.1) 

Now it may happen that the inequality z <iZi can be transformed to the form 0 < or 
0 > fi, where tx is some function depending on the value Zx and the x’s hut not on 0. For 
instance, if z — x — B yre shall have 

X — B <Zx 

and hence B > x — Zx- 

If this transformation can be made we then have, from (10.1), 

P{B <tx\B) = 9 . 

62 


(19.2) 
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More generally, suppose that we can find a function fi, depending on a and the aj’s 
but not on 0, such that (19.2) is true for all 0. Then we may use this equation in probability 
to make certain statements about 0. 

19.4. Note, in the first place, that we cannot assert that the probability is a that 
0 does not exceed a constant This statement (in the frequency theory of probability) 
can only relate to'the variation of 0 in a population of 0*s, and in general we do not know 
that 0 varies at all. If it is merely an unknown constant then the probability that 0 <tt 
is either unity or zero. We do not know which of these values is correct, but we do know 
that one of them is correct. 

We therefore look at the matter in another way. Although 0 is not a random variable^ 
tt is and will vary from sample to sample. Consequently, if we assert that 0 < in each 
case presented for decision, we shall be right in a proportion a of the cases in the long run. 
The statement that the probability of 0 is less than or equal to some assigned value 
has no meaning except in the trivial sense already mentioned ; but the statement that 
a statistic ti is greater than or equal to 0 (whatever 0 happens to be) has a definite proba- 
bility a of being correct. If therefore we make it a rule to assert the inequality 0 < <» 
for any sample values which arise, we have the assurance of being right in a proportion 
a of the cases “ on the average ’’ or in the long run.’’ 

This idea is basic to the theory of confidence intervals which we proceed to develop, 
and the reader should satisfy himself that he has grasped it. 

19.5. To simplify the exposition we have considered only a single quantity ti and 
the statement that 0 < ti- In practice, however, we usually seek for two quantities 
and tiy such that 

P{<o <0 <^i I 6} = a, (19.3) 

and make the assertion that 0 lies in the range to to ti. These quantities are known as the 
Lower and Upper Confidence Limits respectively. They depend only on a and the sample 
values. For any fixed a the totality of values of to and h for different samples determine 
a field within which 0 is asserted to lie. This field is called the Confidence Belt or Region 
of Acceptance. We shall give a graphical representation of the idea below. The number 
a is called the Confidence Coefficient. 

Example 19 J 

Suppose we have a sample of n from the normal population with unit variance 

dF = {“ i — 00 < a: < C30. 

V(27r) 

The distribution of means x will be 

dF = I ““ 1 CO <x < 00 . 

From the tables of the normal integral we know that the probability of a positive deviation 
from the mean not greater than twice the standard deviation is 0-97726. We have 
then — 
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which is equivalent to 



2 




</« 



0-97726. 


Thus, if we assert that ii is greater than or equal to £ — 2/Vn we shall be right in about 
97-725 per cent, of the cases. 

Similarly we have 


P 




<x + 


^/n 



0-97726. 


Hence, combining the two results, 

P^x - <» + A|^| = 2 (0-97726) - 1 = 0-9646. 

Hence, if we assert that [i lies in the range x ± 2/Vw we shall be right in about 95*46 per 

cent, of the cases in the long run. 

Conversely, given the confidence coefficient we can easily find from the tables of the 

normal integral the deviation d such that P 1 5 ^ f ^ j. — a. For instance, 

if a = 0*8, d = 1*28, so that if we assert that fx lies in the range x ± l*28/\/?i the odds 
are 4 to 1 that we shall be right. 

The reader to whom this approach is new will probably ask : but is this not a round- 
about way of using the standard error to set limits to an estimate of the mean ? In a 
way, it is. In effect, what we have done in this example is to show how the use of the 
standard error of the mean in normal samples may be justified on logical grounds without 
appeal to new principles of inference other than those incorporated in the theory of proba- 
bility itself. In particular we make no use of Bayes’ postulate..* 

Another point of interest in this example is that, the upper and lower confidence limits 
derived above are equidistant from the mean :r.^^This is not by any means necessary, 
and it is easy to see that we can derive any number of alternative limits for the same con- 
fidence coefficient a. Suppose, for instance, we take a = 0*9546, and select two numbers 
a© and ai, which obey the condition 

(ao + ai - 1) = 0*9646, 

say ao = 0*9645 and ai = 0*99, Prom the tables of the normal integral we have 


pj* - I I = 0-9646, 

and hence 

_ f _ 2-326 1-806 , 1 

<* + -^(^1 = 0 - 9646 . 

Thus, with the same confidence coefficient we can assert that /i lies in the range x — 2/y/n 
to 5 -1- 2/ V^j or in the range x — 2*326/\/^ to l*806/'\/w. In either case we be 
right in about 96*46 per cent, of the cases. 

We note that in the first case the range is 4/V^ units and in the second case it is 
4*132/Vw units. Other things being equal, we should choose the first set of limits since 
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they locate the parameter in. a narrower range. We shall consider this point in more 
det^ below. It does not always happen that there is an infinity of possible confidence 
Imuts or, if there is, that any simple rule of choice between them can be formulated. 

Qra/phical Representation 

19.6. In a number of simple cases, including that of the previous example, the oon> 
fidence limits can be represented in . a useful graphical form. We take two orthogonal 
axes, OX relating to the observed x and OY to p (see Fig. 19.1). 



Fia. 19.1. 

The two straight lines shown have as their equations 

/t = X -h 2, // = X — 2. 

Consequently, for any point between the lines, 

X -- 2 <(i < X + 2. 

Hence, if for any observed x we read off the two ordinates on the lines corresponding to 
that value we obtain the two confidence limits. The vertical interval between the limits 
is the confidence range (shown in the diagram for £ = 1), and the total zone between the 
lines is the confidence belt. We may refer to the two lines as the Upper and Lower 
Confidence lines respectively. 

This Avamplft relates to the somewhat trivial case « — 1. For different values of n 
there will be different confidence lines, all parallel to fi -= x. They may be shown on a 
aingift Hia. gr a.Tn for selected values of n, and a figure so constructed provides a useful method 
of reading off confidence limits in practical work. 

A.S. — ^vot. n. *■ 
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Central and Non-central Intervale . 

19 . 7 . In Example 16.1 the sampling distribution on which the confidence intervals 
were based was symmetrictd, and hence, by taking equal deviations firom the mean, we 
reached equal areas of the frequency function as ao and ai. In general we cannot achieve 
this result with equal deviations, and subject always to the condition ao + cci — 1 = % 
the two quantities may be chosen arbitrary. 

If a« and ai are ti^en to be equal, we shall say that the intervals are centred. In such 
a case we have 

P(«, <0)=P(6 = .... (19.4) 

In the contrary case the intervals will be called rum-central, 

19 . 8 . In the absence of other considerations it is usually convenient to employ 
central intervals, but circumstances sometimes arise in which non-central intervals are 
more serviceable. Suppose, for instance, we are estimating the proportion of some drug 
in a medicinal preparation and the drug is toxic in large doses. We must then clearly 
err on the safe side, an excess of the true value over our estimate being more serious than 
a deficiency. In such a case we might prefer to take ai very near to unity or even equal 
to unity, so that 

P (fl < <x) = 1 
P («* < 0) = a, 

and we are certain that 0 is not greater than ti. 

Again, if we are estimating the proportion of viable seed in a sample of material that 
is to be placed on the market, we are more concerned with the accuracy of the lower limit 
than that of the upper limit, for a deficiency of germination is more serious than an excess 
from the grower’s point of view. In such circumstances we should probably take ag as 
large as convenieiitly possible so as to be nearer to certainty about the minimum value 
of viability. This kind of situation often arises in the specification of the quality of a 
manufactured product, the seller wishing to guarantee a minimum standard but being 
much less concerned with whether his product exceeds expectation. 

19 . 9 . On a somewhat similar point, it may be remarked that in certain circum- 
stances it is enough to know that P {U <0 <<t|0} exceeds some quantity a. We then 
know that in asserting 0 to lie in the range fg to h we shall be right in at least a proportion 
a of the cases. Mathematical difficulties in ascertaining confidence limits exactly for 
given a, or theoretical difficulties when the distribution is discontinuous may, for example, 
lead us to be content with the inequality rather than the equality of (19.3). 

Example 19.2 

To fi nd confidence intervals for the parent proportion m of successes in sampling for 
attributes. 

In samples of n the distribution of successes is given by the binomial (x -i- m)”. We 
will determine the limits for the case n = 20 and confidence coefficient 0’96. 

We require in the first instance the distribution function of the binomial, which is 
obtainable from Table 5.2 (vol. I, p. 119). Summing the number of successes and dividing 
by 10,000, we find from that table the following : — 
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1 Proportion of 


... 

Successes 

m = 0-1 

6 

f( 

V 



0-00 

0-1216 

0-0116 

0-06 

0-3918 

00691 

0-10 

0-6770 • 

0-2060 

0-16 

0-8671 • 

0-4114 

0-20 

0-9669 

i 0-6296 

0-26 

0-0888 

j 0-8042 

0-30 

0-9977 

0-9133 

0-36 

0-9997 

! 0-9678 

0-40 

1-0001 

1 0-9900 

0-46 

1-0002 

0-9974 

0-60 

— 

! 0-9994 

0-66 


0-9999 

0-60 

— 

1-0000 

j 0-66 

— 



[ 0-70 

— 



‘ 0-76 


... 

0-80 

— 


0-86 



0-90 

— 

- _ 

1 0-06 ' 

— 



w = 0-3 

! w 0-4 

W = 0-6 

0-0008 

i 


0-0076 

0-0006 

! 

0-0364 

0-0036 

: 00002 

0-1070 

0-0169 

1 0 0013 

0-2374 

0-0509 

i 0-0069 

0-4163 

; 0-1255 

! 0-0207 

0-6079 

:■ 0-2499 

1 0-0577 

0-7722 

0-4158 

0-1316 

0-8866 

! 0-5956 

0-2617 

0-9620 . 

0-7652 

! 0-4119 

0-9828 

0-8723 ! 

! 0-5881 

0-9948 

i 0-9433 

0-7483 

0-9987 

i 0-9788 ! 

! 0-8684 

0-9997 

1 0-9934 

0-9423 

0-9999 

0-9983 

0-9793 

— 

i 0-9996 

0-9941 


0-9999 

0-9987 


I 

1 0-9998 

— 


1-0000 


The final figures may be a unit or two in error owing to rounding up, but that need 
not bother us to the degree of approximation here considered. Values for w — 0*6 to 0*9 
may bo obtained by symmetry. 

We note in the first place that the variate p is discontinuous. On the other hand 
we are prepared to consider any value of to in the range 0 to 1. J?"or given to we cannot 
in general find limits to p for which a is exactly 0-95 ; but we will take p to be the nearest 
multiple of 0*05 which gives' confidence coefficients at least equal to 0-95, so as to be on 
the safe side. We will consider only central intervals, so that for given to we have to find 
Pf^ and px such that 


P {w>po}> 0-975 
P {to <j)x} > 0-975, 


the inequalities for P being as near to equality as we can make them. 

Consider the diagrammatic representation of the type shown in Fig. 19.1 and given 
for our present case in Fig, 19.2. 

From the table we can find, for any assigned to, the values tOo and tOi such that 
P {p > Wo) > 0-976 and P {j) < tOx) > 0-975. Note that in determining tOj the distribution 
function gives the probability of obtaining a proportion p or less successes, so that the 
complement of the function gives the probability of a proportion I — p — 0-06 or less 
(not 1 — p)» Here, for example, on the horizontal through w ^ 0-1 find Wq 0 and 
tax == 0-30 from our table ; and for w = 0-4 we have cjo ^ 0-15 and ^ 0 - 66 ^ The points 
so obtained lie on stepped curves which have been drawn in. The zone between them is 
the confidence belt. For any p the probability that we shall be wrong in locating m inside 
the belt is at the most 0-05. We determine po and 7)1 by drawing a vertical at the given 
value of p on the abscissa and reading off the values where it intersects the curv^es. That 
these are, in fact, the required limits will be shown in a moment. 
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We could have found more precise confidence limits by interpolating in the table 
obtained above. For example, with p = 0*30 we see that 


for m = 0*1, P = 0*9977 
for m = 0*2, P = 0*9133. 


Hence, for P = 0*975 we have approximately 


w — 0*1 


9977 - 9760 
9977 - 9133 


(0*1) = 0*127, 


and closer approximations can be obtained if desired. The corresponding point on the 



Fio. 19.2. 

lower confidence line to Wi = 0*127 is p = 0*36. Calculations on these lines give us the 
values of vj such that 

P {p, < in <pi} = a exactly, 
whereas the former approach gave values such that 

P{po <ro <Pi} = a approximately, 

> a in any case. 

Discontinuous variatm usually give rise to this sort of arithmetical nuisance, but the 
approximation in practice is sufficiently good, except for very small samples. The broken 
curves in Fig. 19.2 give the more precise limits. They lie, of course, inside the more 
approximate step-curves. 

It is, perhaps, worth noticing that the points on the curves of Fig. 19.2 were constructed 
by selecting an ordinate m and then finding the corresponding abscissae and tOi. The 
diagram is, so to speak, constructed korizoniaUy. In applying it, however, we read it 
vertically, that is to say, with observed abscissa p we read off two values p, and p, and . 
assert that po < to < p^. It is instructive to observe how this change of viewpoint can 
be justified without reference to Bayes’ postulate. 
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Coosider Fig. 19.3, which shows a pair of confidence lines for the binomial. Let m' 
be a given value of m and let the horizontal through m' meet the confidence lines in points 
with abscissae tv, and lOi. Then we know that in repeated samples from a population 
with parameter ro' a proportion a will give observed values of p lying between tSo and Wi ; 
for the curves were constructed so that this should be so. 

Now since the horizontal at to' lies entirely within the confidence belt for tOo < P < tOi 
(and does so for any to'), it follows that the assertion that to' lies in the belt is correct if, 



11 .T 1 H only if, p lies between toq and toi, that is in a proportion a of the cases. This, being 
true for any to', is true for all to', irrespective of the relative frequency of occurrence of the 
to’s under estimate. Consequently our assertion that to lies in the confidence belt is correct 
in a proportion a of the cases ; and, in particular, for any observed p we may assert that 
to lies within the ordinates determined on the two curves by the vertical through p. 


Confidence Intervals for Large Samples 

19 . 10 . In our usual notation, the logarithm of the likelihood function gives 

]ogL= J^log/(a;;, 0), .... 

1-1 


and 

- dlogL 

We may regard — 


dlogL _ ^dlogf 
■ 30 dO ’ 

as a random variable, and in particular write — 




(19 .6 
(19.6) 


so that 


(19.7) 
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Write 


91ogL 
_ dO 
^ V(nAy 


. (19.8) 


Then, for large samples, yr will be distributed normally in the limit with unit variance, in 
virtue of the Central Limit Theorem, under very general conditions. It will also have 
zero mean, since 




(19.9) 


Hence, from the distribution of y> we may easily determine confidence limits for 0 in large 
samples if ^ is a monotonic function of 6, so that inequalities in one may be transformed to 
inequalities in the other. 


df 

It ^is sufficient (but not necessary) for the existence of the normal limit to y> that ^ 


exists for all a;, except perhaps at isolated points, that the range is independent of d and 

that the Central Limit Theorem applies (e.g. if the third moment of ^ -^5*^ exists). We 

ou 


also assume, as usual, that differentiation under the integral sign, as in (19.9), is legitimate. 


Example 19,3 

Consider again the problem of Example 19.1. We have, with p for d. 


Hence 


(W) = 


1 

V{2n) 


exp {- i (» - A*)*} 


3 log/ _ 

3/1 




var 


= 1. 


is normally distributed with unit variance for large n. (We know, of course, that this 
is true for small n as well in this particular case.) The confidence limits may then be set 
as in Example 19.1. 


Example 19.4 

Consider the Poisson distribution whose general term is 


f(x, A) = 


xl ' 
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We have 

9 tog/ 

dX A 


Hence 




ii’W- 

y/{n/X) 



A). 


For example, with a = 0-95, corresponding to a normal deviate ± 1’96, we have, for the 
central confidence limits, 

± 

giving, on solution for A, 

A2 )- A + = 0 

n V \ ^ 


the ambiguity in the square root giving upper and lower limits respectively. 
To order n * this is equivalent to 


X 


== i- + 1-96 



from which the upper and lower limits are seen to be equidistant from the mean x, as we 
should expect. 


Shortest Sets of Confidence Intervals 

19.11. It has been seen in Example 19.1 that ip some circumstances at least there 
exist more than one set of confidence intervals, and it is now necessary to consider whether 
any particular set can be regarded as better than the others in any useful sense. The 
problem is analogous to that of estimators, where we found that in general there are many 
different estimators for a parameter, but that we could sometimes find one (such as that 
with minimum variance) which was superior to the rest. 

In Example 19.1 the problem presented itself in rather a specialised form. We found 
that for the intervals based on the mean x there were infinitely many sets of intervals 
according to the way in which we selected ao and ai (subject to the condition that 
ao + ai = 1 + a). Among these the central intervals, a re obviously the shortest, for a 
given range will include the greatest area*' of the normal curve if it is centred at the mean 
of the curve. We might reasonably say that the central intervals are the best among 
those determined by x. 

But it does not follow that they are the shortest of all possible intervals, or even that 
such a shortest set exists. It might also happen that for two sets of intervals Ci and Ct 
those of Cl are shorter than those of in part of the range of x’s and longer in other parts. 
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19.12. We will therefore consider sets of intervals which are shortest on the average. 
That is to say, if 

d = «, - <, 

we require that 


ddF = ntinimum, 


. (19.10) 


. (19.11) 


where the integral is taken over all x’a and is therefore equivalent to 

r . . . r didar, . . . . . . . (19.11) 

J —CO J —00 

We now prove a theorem which is very similar to the result that maximum-likelihood 
estimators in the limit have minimum variance, namely that in a certain class of intervals 
the method of 19.10 gives those which are shortest on the average. 

Let h (x, 0) be a function which has a zero mean value and is such that the sum of 
a number of similar functions obeys the Central Limit Theorem. Then 


h {xj, 6) 



^/(n var h) 


. (19.12) 


is normally distributed in the limit with zero mean and unit variance, y) of equation 
(19.8) is a member of the class C* We prove that the average rate of change of tp with 
respect to 0, for each fixed 0, is greater than that of any C except in the trivial case 


do 


Hence 


5 Jo f 

Writing g (x, 0) = — 5^, we have 
0(/ 

00 y/(n var g) \ 

= -L / 

30 A) \ 


dO 2varsr ^ dO j ' 
„dh 1 „ . dvarA) 


(19.13) 


(19.14) 


-L . ^ UEi 

\^ / V{n var g^) \ \ / 2 var g 


Now E (g) = 0 and 




= - £ (?*). 


(»)= 


= _ (g*) 

Vln var g) 

= — Vinvatg) «= Ji, say. 


Similarly, 




(19.15) 


(19.16) 
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Since E {h) = 0 we have 
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Hence 


= — cov (A, g). 


A*f — Al — n var g — — - cov® (A, g) 

var h 

Vh 

= {var A var g — cov* (A, g) }, 

Thus, unless A is a multiple of </, we have 

A 'i > A l 

which was to be proved. 

Now if tp^ is a value such that 


(19.17) 


(19.18) 




(19.10) 


(19.20) 


the upper and lower confidence points for central intervals arc -t y* values of 9 

are the solutions of 



y/ln var g) 

say to and ti. Similarly those for any function A are given by 

i:A(x,e) 

\/(n var A) 

say Uq and Ui. The equations for confidence points are equivalent to 

(0 = ± Vot 

: (u) = i 

or, effectively, in large samples, by 

v(0.) + « ± 

: (Oo) 1- (m - Oo) = ± Va. 

where Oo is a fixed value, of 0. When t ~ 0^ and u ~ 0^ we have %p (Oo) — C (Oo). Hence 



Now we have just shown that, on the average, ^ Hence, on the average, 

t ■”" Oq ^ Ooi 

and the confidence limits t are closer together than those of any member of the class u for 
any fixed value of 0. 


19.13. A comparison of the result we have just proved and the properties of maxi- 
mum likelihood estimators in the limit will show the close relation between confidence 
intervals and the theory of estimation developed in Chapter 17. In 17-27 we shpwed. 
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by considering the quantity u = 


dlogZf 
d6 ’ 


that any estimator t which is in the limit 


distributed normally about the true value do cannot have a variance less than 



and that the latter quantity, in the limit, is the variance of the maximum likelihood esti- 
mator. It attains the minimal value when u is constant over samples for which t is constant. 

The theorem of 19.12 shows that on the average the intervals determined by the 
distribution of u are shorter than those based on any other function with a zero mean value 
(obeying the usual conditions as to continuity, etc.). Since the maximum likelihood 
estimator has minimum variance, we should expect that confidence intervals based on its 
distribution would be shorter than others ; and this we now see to be so. For if is constant 
over samples of constant t, the distribution of u in all samples is equivalent to that of t. 


Confidence Intervala and Sufficient Estimators 

19.14. Pursuing this line of thought, we are led to inquire whether sufficient esti- 
mators provide confidence intervals for finite samples and whether they have any minimal 
properties of the kind we have just established for large samples. 

It is easy to see that sufficient estimators do in fact provide confidence intervals. 
If t is sufficient for 0, the likelihood function may be put in the form 

L^fAt,d)fo(x, . . . xj . . . .(19.22) 

and the distribution of t and 6 is 

dF^f^(t,0)dt (19.23) 

Given a we can then find to and ti such that F {to, 0) == 1 — ao and F (ti, 0) ~ cti and solve 
for 0 in terms of and ag or ti and ai, as the case may be. This process will provide the 
inequalities of the type we require, a proposition which we shall prove formally below 

(19.25). 


Example 19.5 

In Example 17.8 we saw that 
is sufficient for 0 in the distribution 


s 


X 

V 


dF = - V- _ dx, 

r{p)d^ 


0 < a; < cx), p > 1, 
where p is regarded as known. The distribution of 5 is in fact 


dF 


= (V:P 
\ 0 




gnp-1 gxp ^ 


r{np) 


dS. 


n/D d 

The distribution function of m = is the incomplete /’-function 


0 

I'm jnp) 
r(np) 
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We then find the values of m corresponding to and ai from the tables, and have 

P (m < w«) = ao 
P (m> m,) = ai, 

whence 


, mo 


nil 


}- 


ao + ax — 1 


= a. 


19.15. The position in regard to minimal properties of confidence intervals based 
on sufficient estimators remains somewhat obscure, but one would expect some such proper- 

d lo Ij 

ties to hold even for finite n. Since u = — ^ — is constant for constant t when t is sufficient, 


the variance of u will be a function of the variance of L This, however, is not necessarily 
enough to establish the fact that the corresponding confidence intervals are shortest on the 
average. It is imaginable that the confidence intervals derived from its distribution might 
be longer on the average than those of some other system. This seems rather unlikely, 
at least for the ordinary distributions of statistical theory, but apparently no proof has 
been given. 


19.16. Neyman ( 19376 ) has proposed to apply the phrase “ shortest confidence 
intervals ” to sets of intervals defined in quite a different way. As it does not appear 
that such intervals are necessarily the shortest in the sense of possessing the least length, 
even on the average, we shall attempt to avoid confusion by calling them “ most selective.** 

Consider a set of intervals c®, typified by 6, obeying the condition that 

P {aoC0 10} = a, (19.24) 

where we write So c 0 — that is, So “ contains ** 0 — ^for the more usual < 0 < ^ (^i — = 0©)- 

Let Cl be some other set typified by Si such that 

P{<5xC010} = a (19.26) 

Either set is a permissible set of intervals, as the probability is a in both cases that the 
range S contains 0. 

If now for every Ci we have, for any value 0' other than the true value, 

P {Socd'ie} <P {SicO^lO}, . . . .(19.26) 

Co is said to be most selective, 

19.17. The ideas underlying this definition will be clearer from a reading of Chapters 
26 and 27 dealing with the Neyman-Pearson theory of inference. We anticipate them here 
to the extent of remarking that the object of most selective intervals is to cover the true 
value with assigned probability a, but to cover other values as little as possible. We may 
say of both Co and Ci that the assertion 6 c 0 is true in proportion a of the cases. What 

out Cq for choice as the most selective set is that it covers false values less frequently 
than the remaining sets. 

The difference between this approach and the one leading to shortest intervals is that 
the latter is concerned only with the narrowness of the confidence interval, whereas the 
former gives weight to the frequency with which alternative values of 0 are covered. One 
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concentrates on locating 0 with the smaUest mwgin of error ; the other takes into account 
the desirability of excluding so far as possible false values of 0 from the interval, so that 
mistakes of taking the wrong value are minimised. 

19 . 18 . Neyman himself has shown that most selective sets do not usually exist (for 
instance, if the distribution is continuous) and has proposed two alternative systems : — 

(а) most selective one-sided systems (Ne 3 rman’s “ shortest one-sided ’* sets) which 

obey (19.26) only for values of 0' — 0 which are always positive or always negative ; 

(б) selective rmbiassed systems (Neyman’s “ short unbiassed ” sets) which obey 

(19.26) but, in place of (19.26), the further relation 

P{060 10} = a>P{0c010'} (19.27) 

In essence these sets amount to a translation into terms of confidence intervals of 
certain ideas in the theory of tests of significance, and we may defer consideration of them 
until Chapters 26 and 27 are reached. 

Oeneraliaation to the Case of Several Parameters 

19 . 19 . We now proceed to generalise the foregoing theory to the case of several 
parameters. Although, to simplify the exposition, we shall deal in detail only with a single 
variate, the theory is quite general. We begin by extending our notation and introducing 
a geometrical terminology which may be regarded as an elaboration of the diagrams of 
Figs. 19.1 and 19.2. 

Suppose we have a frequency function of known form depending on I unknown para- 
meters, 0x . . . 0j, and denoted by f(x,0i. . . Of). We may require to estimate either 
0x only or several of the O’b simultaneously. In the first place we consider only the estima- 
tion of a single parameter. To determine confidence limits we require to find two functions 
Ug and Ui, dependent on the sample values but not on the 0’s, such that 

P {«, < 01 < tti I 01 . . . 0,} = a (19.28) 

where a is the confidence coefficient chosen in advance. 

With a sample of n values, Xi . . . we can associate a point in an n-dimensional 
Euclidean space, and the frequency-distribution will determine a density function for 
each such point. The quantities ttg and Ut, being functions of the a;’s, are determined in 
this space, and for any given a will lie on two hypersurfaces (the natural extension of the 
confidence lines of Fig. 19.1). Between them will lie a Confidence Zone or Region of 
Acceptance. 

In general we also have to consider a range of values of 0 which are a priori possible. 
There will thus be an l-dimensional space of 0’s subjoined to the n-sp8M)e, the total region 
of variation having (I -|- n) dimensions ; but if we are considering the estimation of 0i, 
this reduces to an (n -f l)-8paoe, the other {I — 1) i>arameters not appearing as variables. 

We shall call the sample-space W and denote a point whose co-ordinates are Xi ... 
by E. We may then write itg (E), Ut (E) to show that the confidence functions depend 
on E. The interval Ut {E) — Ug (E) we denote by d (E) or 6, and as above we write 'dcOi 
to dmiote Ug < 01 < «i. The region of acceptance or confidence zone we denote by A, 
and may write Ee0orEeAto indicate that the sample-point lies in the interval 6 or 
the region A. 
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19.20.^ ^ Pig. 19.4 we have shown two axes Xi and x, and a third axis corresponding 
to the variation of Oj. The sample-space W is thus two-dimensional. For any given 
say Oi, the space TT is a hyperplane (or part of it), one such being shown. 


e; 


^2 

Fia. 19.4. 

Take any given pair of values (xiy and draw through the point so defined a line 
parallel to the fli-axis, such as PQ in the figure, cutting the hyperplane at R. The two 
values of Uq and Ui will give two limits to 0i corresponding to two points on this line, say 
U, V. Consider now the lines PQ as x^ X 2 vary. In some cases f/, V will lie on opposite 
sides of R, and lies inside the interval UV- In other cases (as for instance in U'V' shown 
in the figure) the contrary is true. The totality of points in the former category deter- 
mines the region of acceptance A, shaded in the figure. If for any point in A wo assert 
d c 0[, we shall be right ; if we assert it for points outside A we shall be wrong. 

19.21. Evidently, if the sample-point E falls in the region A, the corresponding 
6\ lies in the confidence interval and conversely. It follows that the probability of any 
fixed 0[ lying in the confidence interval is the probability that E lies in A (Oj) ; or in 
symbols — 

p{d c e; 1 01 . . . 0,} = p {mo < fli < Mt 1 01 • • • 

= P (E e A (0i) I 01 . . . 0j}. . (19.29) 

From it follows that if the confidence functions are determined so that 

P(mo < 01 < Ml I 01 ... 0/} = a 

we shall have, for all 0i, 

P{E s A (0i) I 0, ... 0,} = a (19.30) 

It follows also that for no 0i can the region A be empty, for if it were the probability in 
<19.30) would be zero. 
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19.22. If the functions and Ui are single-valued and determined for all E, then 
any sample-point will fall into at least one region of acceptance. For on the line PQ cor- 
responding to the given E we take an B between V and V, and this will define a value of 
Qx, say such that E s A {O'l). 

More importantly, if a sample-point falls in the regions A ($[) and A (6x) correspond- 
ing to two values of $i, 6'i and d[, it will fall in the region A (d'l), where O'l is any value 
between By and B"x. For we have 

tt* <B[ <«i, «» <0x <«i, 

and hence Uo < 

if B'[ is the greater, and hence 

Uq ^ By ^ Bx ^ ^ 

or tto < B "i < «i. 

Further, if a sample-point falls in any of the regions A {Bi) for the range of 0-values 
Bi <Bi < O't, it must also fall within A (d^) S'^d A (0x). 

19.23. The conditions referred to in the two previous sections are necessary. We 
now prove that they are sufficient, that is to say : if for each value of Bi there is defined 
in the sample-space W a region A such that 

(1) P{E e A (Bi) I Bi} = a, whatever the value of the 0’s ; 

(2) For any E there is at least one 0^, say 0j, such that E e A (0j) ; 

(3) If E c A (0i) and E e A (0i), then E e A (0i") for any B'l between B'l and B'l ; 

(4) If E e A (0i) for any 0i satisfying 0^ < 0i < 0'i, E e A (0i) and E e A (0x) ; 

then u« and Ui, viz. confidence limits for 0, are given by taking the lower and upper bounds 
of values of 0i for which a fixed sample-point falls within A (0i). They are determinate 
and single- valued for all E, u, < Uu and P {uo < 0i < | 0i} = a for all 0i. 

The lower and upper bounds exist in virtue of condition (2), and the lower is not greater 
than the upjjer. We have then merely to show that P {uo < 0i < Wt | 0i} = a, and for 
this it is sufficient, in virtue of condition (1), to show that 

P{u. <0x <«x|0i} =P{Ee A(0i)|0i}. . . .(19.31) 

We already know that if E e A (0,) then ; and our result will be established 

if we demonstrate the converse. 

Suppose it is not true that when Ut < 0i < «x, E e A (0i). Let E' be a point outside 
A (0i) for which «o < Then must either Uo = 0i or Ut — Bi or both ; for other- 

wise tto and tti being the bounds of the values of 0i for which E lies in A (0i), there would 
exist values 0^ and B'j, such that E e A (Oj) and E e A {B'l) and 

tto ^ 01 01 01 ^ ttl, 

so that, firom condition (3), E e A (0i) which is contrary to assumption. 

Thus tto — 01 or ttl = 01 or both. If both, then E must fall in A (0j), for tto and tti 
are the bounds of 0-values for which this is so, and if they coincide their common value 
must be so. Finally, if tto = 0i < tti (and similarly if tto < 0i = tti) we see that for 
tto < 01 < ttl, E must fall in A (0i) from condition (3), and hence, from condition (4), E 
must fall in A (0i) and A (0i) where B{ = tto and 0',' = tti. Hence it falls in A (0i). 

' 19.24. The foregoing theorem gives us a formal solution of the problem of finding 
confidence 'intervals in the general case, but it does not provide a method of finding the 
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intervals in particular instances. In practice we have three lines of approach : (1) to use 
sufficient estimators, (2) to adopt the process known as “ studentisation,” and (3) to 
“ guess ” a set of intervals in the light of general knowledge and experience and to verify 
that they do or do not satisfy the required conditions. 

19.25. Consider the use of sufficient estimators in the general case. If ti is sufficient 
for 0i we have 

L = Li {tif Oi) L 2 (xx . . . O 2 • • • • • • (19.32) 

The locus tx = constant determines a series of hypersurfaces in the sample-space W. If 
we regard these hypersurfaces as determining regions in IT, then tx < say, determines 
a fixed region K, The probability that E falls in K is then clearly dependent only on 
tx and Ox- By appropriate choice of k we can determine K so that 

P{E eK\0x}^ol, 

and hence set up regions of acceptance based on values of ^i. We can do so, moreover, 
in an infinity of ways, according to the values selected for ao and ai. 


Stvdentisation 

19.26. In Example 19.1 we considered a simplified problem of estimating the mean 
in samples from a normal population with unit variance. Suppose now that we require 
to determine confidence limits for the mean /a in samples from 


" ” ivW) { " K 


The approach of Example 19.1 would lead us to the conclusion that, for confidence coefficient 
0-9645 and central intervals, 

<* + — !/., a 1=0-9646. 

But we cannot now say that the confidence limits are x ± ^cr/ y/n because or is unknown. 


Consider then the distribution of z 
is known to be the “ Student ” form 


X - ' 


, where is the sample variance. This 


dF _ .. 

(I + 2*)” 


(Cf. Example 10.6, vol. T, p. 239.) Given a, we can now find 2 . and 2 „ such that 


and hence 

which is equivalent to 



P{ — 2 , <2 < 2 «} = a, 
P{x —8Zt </i <» + «2l} = «• 


Hence we may say that fi lies in the range x — szo to x + szi with confidence coefficient 
a, the range now being independent of either ft or a. In fact, owing to the symmetry of 
“ Student’s ’’ distribution, 2# = * 1 , but this is an accidental circumstance peculiar to the 
present case. 
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19.27. The possibility of finding oonfidenoe intervals in this ease arose firom our 
being able to find a statistic z, depending only on the parameter tmder estimate, whose 
distribution did not oontain a. A scale parameter can often be eliminated in this way, 
although the resulting distributions are not always easy to handle. If, for instance, we 
have a statistic t which is of degree p in the variables, then t/fl^ is of d^ppee zero, and its 
distribution must be independent of the scale parameter. When a statistic is reduced 
to independence of the scale in this way it is said to be “ studentised,” after “ Student ” 
<W. S. Grosset), who was the first to perceive the significance of the process. 

19.28. It is interesting to consider the relation between the studentised mean- 
statistic and confidence zones based on sufficient estimators in the normal case. The 
distribution of means and variances in normal samples is 

and X, 8 are jointly sufficient for (i, a. In the sample space W the regions of constant x 
are h 3 rperplanes and those of constant a are hyperspheres. If we fix £ and « the sample- 
point E lies on a hypersphere of (n — 2) dimensions. Choose an area on this hypersphere 
of content a. Then the acceptance region will be obtained by oombinmg all such areas 
for all X and a. 

One such region is seen to be the “ slice ” of the sample-space obtained by rotating 
the hyperplane passing through the ongin and the point (1, 1 ... 1) through an angle 
jra (not 2a:a because a half-turn of the plane covers the whole space). 

The situation is illustrated for n = 2 in Fig. 19.5. 


H- 


Fro. 19.3. 

For any given p' the axis of rotation meets the hyperplane p ^ p' in the point 

X ' LL 

Xi = Xt = p', and the h 3 rpercones — constant in the W space become the plane 
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areas between two straight lines (shaded in the figure). These may be regarded as regions 
of acceptance, and one set is that obtained by rotating a plane about the line Xi ^ Xt = 

through an angle so as to cut ofiF in any plane an angle — on each side of 

A 

fi = ^2 — • 

The boundary planes are given by 

Xi- H = {Xi- n) tan ^ 

Xi- = (xt- /u) tan ^ I + 1^, 
where jS = ;r(l — a) ; or, after a little reduction, 

„ _ ^t+x, . /? 


*» + X, _ X, - X. I 


fi then lies in the region of acceptance if 

*■ + *•_ L*._ - *.J cot f <,.<*■ + »> + 1 *■ 7 *•-' cot i. 

2 2 2^2 2 2 

These are in fact the limits given by “ Student’s ” distribution for n = 2, since the sample 


variance then becomes 


Xi a?2 1 


so that 


I r dz l(n ^ \ p 

i J.. r+? - 0 I 2 - **" ' 

t.-tan(|-?)-cot| 


19.29. Tables or diagrams of the confidence intervals for selected values of a have 
been given for the following parameters : — 

(а) the proportion m in the binomial (Clopper and Pearson, 1934) ; 

(б) the parameter of the Poisson distribution (Garwood, 1936 ; Bicker, 1937) ; 

(c) the correlation coefficient in normal samples (David, 1938a) ; 

(d) the median in samples from any population (K. B. Nair, 19406). 

In addition, results for the mean of a normal population may be obtained from “ Student’s ” 
integral as shown above. Those for the variance of a normal population may be obtained 
from the /’-function or the equivalent ;f*-integraL For simultaneous estimation of mean 
and variance there are difficulties, as we proceed to show. 


19.30. It might have been expected that the foregoing theory could be generalised 
to give simultaneous pairs of confidence intervals for two unknown parameters when 
intervals for each separately cannot be found. Very little progress in this direction has, 
however, been made. The difficulty may be illustrated by reference to the joint distri- 

A.S.— VOL. II. o 



CONFIDENCE INTERVALS 


bul&n of mean and yarianoe (19.33). From the independent distributions of £ — ju and 
's 

~ we can, given a, fi, find and Ut, Ui such that 

P jllo <^i I = P 

where the ^’s and u’s depend only on sample values and a, p may be chosen at will. The 
ineqtmUtiea are equivalent to 

X — ato < <x + ati (19.34) 

i. <flf <-i (19.36) 

Ut Uo 

and these give 

X‘-^8<u<,x + — 8. . . . . (19.36) 

Uo Ut 

But can we then infer that 


P [x — ^8 <fl 

I Uo . Ut } 




(19.37) 


where y is a constant dependent on a and P ? We cannot. This equation is, in fact^ 
not generally true. The fact can be verified by considering the distribution of the statistic 
£ — and showing that its distribution function F {u) is not independent of fjt, and or. 


19.31 • In the next chapter we shall see that a similar problem, giving rise to Behrens" 
test, provides a crucial point of difference between the theory of confidence intervals and 
that of fiducial intervals. All we need say here is that from the point of view of the former 
the problem of simultaneous confidence intervals for several parameters remains unsolved, 
except of course in the degenerate case when we can find independent intervals for each 
parameter separately. 


19.32. In conclusion we indicate without proof a few results which have recently 
been obtained. 

(1) Wilks and Daly (19396) have generalised the theorem of 19.12 to the case of several 
parameters. Under fairly general conditions the confidence regions which are shortest 
on the average are given by 



f d\ogL 3 log LI 

aer | 




where (a^^) is the inverse matrix to that whose general element is 


E 


/ a log/ a iog/\ 

V aa, aa, ) 


and Xa *hat P (x* < xl) — «> f^e probability being calculated from the ;if*-distri- 

bution with v = 1. This is clearly related to the result of 17.46 giving the limiting forms 
of variances and covariances of maximum likelihood estimators. 

(2) Wald (1942) has considered the problem of large samples from the point of. view 
of most selective sets (“ shortest ” in Neyman’s sense) and has proved results somewhat 
similar to those of Wilks and Daly. 
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(3) Wald wd Wolfowitz (1939b, 1941e) and KolmogorofF (1941) have considered the 
problem of setting confidence limits to the terminals of an unknown frequency-distribution. 

NOTES AND REFERENCES 

When the theory of confidence intervals and that of fiducial intervals were fiirst devel- 
oped many statisticians regarded them as equivalent. In papers written between 1930 
and 1938 “ confidence limits ” and “ fiducial limits ” are often used in the same sense ; 
and even where a distinction of approach was drawn the results given by the two methods 
appeared identical. The case of Behrens’ test, however, provided an illustration where 
the methods lead to different results — see the following chapter. 

The fiducial approach is due to R. A. Fisher, references being given at the end of 
Chapter 20. The approach of the present chapter has been developed mainly by Neyman 
(see particularly 1937b), E. S. Pearson, Wilks (1938b, c, 1939a and — ^with Daly — 1939b), 
Wald (1939a, 1942), Welch (1939a), and Bartlett (1936a, 1939a). A number of the references 
to Chapters 26 and 27 are also relevant. 

Confidence intervals can be obtained for the median and other quantiles which are 
independent of the form of distribution. See Thompson (1936), Savur (1937a) and K. R. 
Nair (1940b), and compare Exerci.se 19.5. 


EXERCISES 

19.1. Show that for the rectangular population 

dF =% 0 < a: < 0 

d 

and confidence coefficient a, confidence limits for B are t and t/rp where t is the sample range 
and If) is given by 

{ re — (» — 1) y) = 1 — a. 

(Wilks, 1938c.) 


19.2. Show that, for the distribution of the previous exercise, confidence limits 
for samples of two, Xi and Xi, are 

Xi “i“ X% Xi -f- 3^2 

l + V(l'-«)’ l-V(l“-a)’ 

(Neyman, 1937b.) 


19.3. Show also, in the case of the previous exercises, that if L is the larger of a 
sample of two, confidence limits are 

r _ 

V(i-a)* 

(Neyman, 1937b.) 

Show further that if Jf is the largest of samples of four, confidence limits are 


Jf, 


M 


(1 - 0L)i‘ 

(For an experimental verification, see Frankel and Kullback, 1940.) 
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19.4. Show that, for the disixibation 

dF =6 €“*• dx, 9 < » < 00 

central confidence limits for large samples with a = 0*96 are given by 






(Wilks, 1938c.) 


19.5. If a frequency function is continuous, the probability that the itth of a sample 
of n (arranged in ascending order of magnitude) lies in the range dx is 

- (I _ Fr-'‘dF, 

B{k,n-k + l) ^ ’ ' 

where F is the distribution function. Deduce that 


P {** < if < = 1 - 2 / 0.6 (n - 1: + 1, ^)y 

where M is the median, and hence show how to determine confidence intervals for M from 
the incomplete R-fiinotion. 

Generalise the result for quantiles. Show that the results do not hold for discon- 
tinuous distributions. 

(Thompson, 1936.) 



CHAPTER 20 

FIDUCIAL INFERENCE 


20.1. We now proceed to examine a type of inference known as fiducial. As in 
other methods of estimation, given a distribution of known form depending on an unknown 
parameter 0, we shall attempt to find , limits between which 0 lies in som6 sense associated 
with the theory of probability. To that extent our present approach is similar to the 
use of estimators with their associated sampling error and to the use of confidence intervals ; 
but it is distinct from the latter both in essential ideas and in some of the results to which 
it leads. 


20.2. Consider samples of n from a normal population of unknown mean and 
unit variance. The sample-mean x is sufficient for fi and its distribution is 


dF = 



( 20 . 1 ) 


In speaking of a distribution in this sense we regard fjt as fixed and consider the totality 
of values of x derived by random sampling from the population with given [i. The pro- 
portion of samples falling in a range dx is then given by (20.1), which holds for each 
value of fi. 

We now change our viewpoint and consider a different kind of distribution based on 
(20.1). If we are given a value of x from a sample, what are the values of fi which could 
have given rise to this value to any fixed level of probability ? If the deviation 5 ~ is 
written as A, we know that the probability of the inequality 


X fi <,h . . . . . . (20.2) 


being true is a, where a depends on h and is in fact 



(20.3) 


Looking at this the other way round, we may say that given any a we can find A, a function 
of a oxdy, such that 

fi > X — h , . . . . . (20.4) 


is true with probability a. For any fixed x this gives us a distribution of /i. Consider 
in fact the equation 

fjL^x-h ( 20 . 6 ) 


If fx has a distribution function F (//), we have, since (20.4) is true with probability a, 



Vhence 

But in virtue of (20.6), d/* = — dh and h = [x — x. Thus 

= exp ( - 


( 20 . 6 ) 


This is called the fiducial distribution of /t. 
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20.3. It so happens that in this example the non-diffeiential parts of (20.6) and 
(20.1) are the same. This is not essential although it is not infrequent. The oruoial 
point of difference, however, lies in the appearance of the differential element d^, relating 
to the variation of ii, and the disappearance of dx relating to the variation of x. We have 
derived a distribution of the parameter /u from that of the random variable x by trans- 
ferring our attention in (20.4) from x to fi and regarding the inequality as still satisfied 
with probability a. 

20.4. We note in the first- place that this distribution is not necessarily existent. 
When we come to make an inference in any particular case we do not assume that /i is 
itself distributed in the fiducial form in the sense that it has been chosen at random from 
an existent population of fji’a of that form. Such a prior distribution, which would be 
required for the application of Bayes’ theorem, is not admissible from the point of view 
of the frequency theory of probability. The fiducial distribution is a hypothetical one of 
conceivable values of ju. We attach probabilities to these values, or rather to values in the 
range d^, by identif 3 dng them with the probabilities (based on frequency) which are derived 
firom the distribution of a sufficient estimator of ju. For this reason the fiducial distribution 
is not a frequency-distribution in the ordinary sense ; but it is a probability distribution 
in its own special sense. We use it to make statements of the kind : among the values 
of n which are possible, only those in a certain range give rise to the observed x with 
probability a, and hence we will locate /x in that range. 

20 .5 . In our present example the argument would proceed as follows. From equation 
(20.6) and the use of the normal integral, the probability that fx — x does not exceed a 
certain h is ascertainable as a function of h ; for instance, 

-X < = 0-9776. 

If we regard a probability as high as this as acceptable, we may say that /x <,x + 2f-\/n. 

This result is equivalent to that given by the theory of confidence intervals, for if 
we assert fx<.x + 2/ \'n we shall be right in the long run in 97-76 per cent, of the cases. This 
identity of result is found in most elementary oases where a single parameter is concerned, 
but is to be regarded as accidental. In the theory of confidence intervals it is fundamental 
(a) that the assertion as to the parameter lying in a given range should be true in an assigned 
proportion a of the cases, and (6) that no assumption need be made as to the prior dis- 
tribution of the parameter, either in the frequency sense or in the fiducial sense. In fiducial 
theory it is not necessary that (a) should be true, but the fiducial distribution is 
a fundamental part of the inference. 

20.6. There is a further distinction between the two theories. In that of confidence 
intervals it is possible to have two entirely different sets for the same parameter, and in 
fact part of that theory is devoted to finding “ best ” sets among the possible ones. Li 
fiducial theory such a state of affairs must not be possible, for different limits would imply 
different fiducial distributions for the same parameter on the aame evidence. This is avoided 
by confining fiducial distributions to those bpised on sufficient estimators, or more generally 
on a set of estimators which together avoid all loss of informati(m. Since rach estimators 
alone contain all the information relevant to the problem of estimation they alone can 
give, the fiducial distributions accurately. It follows, of course, that where no suffioirat 
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estimator or estimator with complete set of ancillary estimators — can be found, the 
fiducial method is inapplicable. 


20.7. Generally, let F (0, t) be the distribution function of a sufficient estimator t 
for a parameter 0. Then for the fi»quency distribution of t we have 

dF = dt (20.7) 


F {t, 0) is the probability that a random value of the estimator does not exceed a given 
value t. In accordance with the fiducial principle, this may be equated to the probability 
that for fixed t the value of 0 will exceed t, so that for the fiducial distribution of 0 we have 


dF=^{l-F(t.e)}d6 


^ _9F {t, 0) 
00 


dd. 


( 20 . 8 ) 


This shows the general relation between the frequency-distribution of the estimator and 
the fiducial distribution of the parameter. 


Example 20.1 

If p is known, the estimator 0 = ^ is sufficient for 0 in samples from 


X . 

P 


dF - dx, 

OPJ'ip) ’ 


0 < a: < 00 


the distribution of 0 being, in fact, 

\d J r{np) * 

(Cf. Example 17.8.) We may write this in the form 




"=(f) 

It is then clear that, since 


np 


f npB\ 


F{np) 


(f-> ■ 


_ ^ dFdt 

W ^ dt 00 ’ 

the corresponding fiducial distribution of 0 is 


dF = 


/ _ np9 \ 

\Z.TJ 

\ d ) r(np) ^ 0*’ 


(20.9) 


( 20 . 10 ) 


which may also be put in the form (20.0), provided that we interpret the differential element 
now as relating to 0 and not to 0. It will be noticed that we have replaced d0 by 0 


not merely by d0. 

From the fiducial distribution (20.10) we can find the probability that 0 lies in a certain 
range dependent on the observed 0 and the chosen probability a. This is in fact the same 
range that we should obtain by applying confidence intervals to (20.9). Once again the 
results of the two methods are the same. 
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Fiducial Inference based on “ Student's ” DistribtUion 

20.8. Consider now the estimation of the mean /i in samples £rom a normal popula- 
tion with unknown variance a*. The treatment of 20.2 is no longer of use, for it would 
result in a fiducial distribution of ft containing the unknown a. We therefore “ studentise ” 
the problem by considering the distribution of 

t = ~ A*) (20.11) 

8 


which is independent of or, being in fact 

dF oc 


dt 


^2 \ *(»+!)’ 


where v = ti — • 1. Here 8*^ is the unbiassed estimate of the sample variance 

•i 

1 


n — 1 


ised estii 
£(x — x)K 


( 20 . 12 ) 


The distribution of t may be written 

dF oc 

The fiducial distribution is then 

dF oc 


Jl -L - ft)*n \*^’ 
dfi 

\ ^s'*{n-i)] 


. (20.13) 


. (20.14) 


In the usual way we can find two constants, for any given a, such that, from (20.14), 

P{/io </« <iWi} = a, (20.16) 

the probability being based on (20.14) and therefore to be understood in the fiducial sense. 
Had we worked with (20.12) or (20.13) we should have found fi, U Buch that 

P{ - <1 <e <^o} == a, 

which is equivalent to 

P + -^1 = a. . . . (20.16) 

[ y/n J 


This may be interpreted in the sense of confidence intervals, i.e. that in asserting the 
inequality in (20.16) we should be right in a proportion a of the cases in the long 
run. (20.16) does not rest on this statement as to frequency, though the limits to which 
it leads are the same and the statement happens to be true. 


20.9. The case we have just discussed raises a new point. Is it still true that 
the fiducial distribution is unique, and is it consistent with the distributions of fx and a 
separately ? The distribution is based only on the sufficient estimators x and 8' (which 
are jointly but not separately sufficient for fx and a) and we should expect this to be so. 
Blit the matter requires investigation, for we are here using a fiducial distribution based on 
two estimators. 



FIDUCIAL INFERENCE BASED ON “ STUDENT’S ” DISTRIBUTION 8» 
The siinultaneous distribution of * and *' is 


dF ac 






{ 


. (20.17) 


2a* J ' 


exp 

If we were considering fiducial limits for fx with known a we should use the distribution 

dF oc i exp I - ^ (5 - y»)a| dx. 

If we were considering fiducial limits for a with known ft we should nwt use the other factor 
in (20.17), 

dF « (*-)*-■ e,p { - "... (S0.1«) 

for in such circumstances s* is not sufficient for a, the appropriate estimator being 

-Z(x -- yu)*. The question is, what form of fiducial distribution must hold for a in order 
n 

that the “ Student ” form (20.14) should hold for fi when a is unknown ? 

Suppose the fiducial distribution is / (s', a) da. We have then for the joint fiducial 
distribution of // and o, 


dF oc —exp 
a 


(x - /l)* 


We have therefore to solve 


' I d/i'f (s', or) da. 

IJ." s "■ - 


(20.19) 


n 


where k is some constant. Patting (u — x)* = a, — — -j = p, we have then to solve 




1 + 


na 


(71 — 1) «' 




Regarding a as the complex quantity it we see that ^ is the frequency 

function whose characteristic function is -)r which gives 

from which we find 

/(«'. «*p j 

or, on evaluation of the constant, 

2 f (n — 1) «'* 1 «"-*) 

/(s',a)da=- } 


(n — 1) s' 








This, th e", is the fiducial distribution which a must obey. We should have arriyed at 



90 


FIDUCIAL INFERENCE 


the same result had we taken (20.18) and transformed it to the fiducial form, as if it related 
to s' and a only and the former were sufficient for the latter. 

It appearo, then, that in this case at lestst the fiducial method gives consistent results 
when two parameters are involved. The general problem of many parameters presents 
difficulties and has not been elucidated to any great extent. 

TAe Logic of Fiducial Inference 

20.10. The notion of fiducial probability was introduced by Fisher. (1930) for the 
case of a single parameter. Regarding the estimate t as fixed, Ilsher considers the dis- 
tribution of values of 0 for which t can be regarded as a representative estimate — ^representa- 
tive, that is to say, in the sense that it could have arisen by random sampling from the 
populatiori specified by 9. As pointed out above, this does not mean that we are regarding 
the true value of 0 as a member of an existing population. Rather, we are considering the 
possible values of 0 and attaching to each value a measure of orur confidence in it, based 
on the probability that it could have given rise to the observed t. 

If I interpret him correctly, Fisher would regard a fiducial distribution as a frequency- 
distribution. 'This implies that 0 is regarded as a random variable. It appears to me, 
however, that it is not a random variable in the ordinary sense of the frequency theory 
of probability, in which values of 0 either are or can be generated by an actual sampling 
process. We can never test whether the fiducial distribution holds in the frequency sense 
by drawing a number of values and comparing observation with theory. Nor, in calcu- 
lating fiducial limits of the type 9 = t + h (a), do we imply that the proportion of cases 
for which 0 <t + h ia true will be a in the long run. 

20.11. The reader has a choice of several attitudes towards the foundations of the 
fiducial argument : (a) he can accept the argument as involving a new postulate of infer- 
ence ; (6) he can regard it as sanctioned by the approach of the previous section ; or (c) he 
can, so far as estimates based on a single parameter are concerned, console himself with 
the thought that the results of the process are the same as those given by the theory of 
confidence intervals. 

20.12. Although Fisher is careful to emphasise the distinction between his own 
approach and that based on Bayes’ postulate, it is interesting to note that the theory of 
inverse probability as modifi^ by Jeffreys gives results which are in many oases identical 
with those of fiducial inference. 

In the example of 20.2, for instance, suppose that the prior distribution of /( is / (|u) d/t. 
Then for any given x the posterior probability of ^ is 

dF =/(ju)d^^y^exp |-|(* -^)*|. . .(20.21) 

If the total probability is unity we have 

j_ / il*) exp j - I (» - /*)*! d/i = 1. . . . . (20.22) 

Clearly /(/t) » 1 is a solution, and we may use charaoteristio functions to show that it is 
the only solution. In fact we have from (20.22), writing it for nx — 

/ w »p ( - ( - £). 
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The ezpreseion on the right is the characteristio function of exp ^ and henoe 

or/(Ai) = l. 

We have, then, for the posterior probability distribution of /i, . 


= n/2“ - *)*} 


. (20.23) 


which is the same as the fiducial distribution. The requirement that / (/i) = 1 is equivalent 
to a prior distribution of /i, dF — d/i, which is the form given by Bayes’ postulate for a 
parameter which can extend to infinity in either direction. 

Example 20.2 

In Example 20.1, a similar argument leads to a prior distribution of B, 

dF cc-. 

This is the form given by Jeffreys’ modification of Bayes* postulate when a parameter 
can extend to infinity in only one direction. 

.Tt does not appear, however, that fiducial and inverse probability always give the 
same results. Consider the distribution of the correlation coefficient in normal samples 
(14,14)~- 

^ n /I _ ^ ^ 1 /on 9A\ 


n—\ n — 4 


* f C 08 ~^ 

\ V(1 - 


(20.24) 


The argument of the type we have just employed would require a prior distribution of p — 

and the resulting posterior distribution (which is equivalent to that obtained by inter- 
changing r and p in (20.24)) is not the same as we should get by using equation (20.8). 

Behrens^ Test 

20.13. Suppose we have two samples of and members from normal populations 
with possibly unequal variances. The fiducial distributions of fix and fi^ are of the 
‘‘ Student ” form (20.14). Writing 

//j = Ui 

fA% ^ X% -{• 82 U2 

we have 

(Ml — /^a == — 4 ^a (20.26) 

If now 



e depends only on the known quantities x and s', and the difference of means — /t.. 
From the fiducial distributions of pi and henoe make fiducial 

statements of the type 

Xi—Xt — Co V(®1* + * 2 *) <1*1— <Xi — Xt + Bi v'(«i* + « 2 *)- • (?0.27) 


(20.26) 
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The distribution 'of e is not of a simple form. Patting tan %p 


e 



cos — 



sin 


^ we see that 


. (20.28) 


so that e is distributed fiducially as the weighted difference of two variables, each of which 
is distributed as ** Student’s ” L We have then to find the distribution of 


€ = cos %p --it sin %p 

where the joint distribution of tx and is given by 


dF oc 



(20.29) 


The distribution has been studied by Sukhatme (19386) and in more detail by Fisher 
(1941a). Tables are given for various values of ni, and the ratio (or the equiva- 

lent angle %p) showing the values of e corresponding to given probability levels. Some of 
the tables are included in the second (1943) edition of Fisher and Yates’ Statistical Tables for 
Agricultural, Biological and Medical Research. 


20.15. The joint distribution of sj* and s^^ is 

dF cc exp | i («i — 1) ^ i (»* — 1) ds^*. 

Putting ^ « = i I («i — 1) ^ 4- (ws — 1) 

we find, on a little reduction, 

dF oc -p— -f «*«*.+». -*) e-« du. . (20.30) 

fp (fix 1) n, — lli(«i+w,- 2) . 

I ^ J 

Thus u is distributed (independently of p) in the Type III form. Further, 
(zi — jux) — (^a — jWi) is distributed normally about zero mean with variance of + Ug. 

Hence, if ^ = 6, we find that the quotient 

{ (£t - fit) - (jgf - <Mi)} * (»i + - 2) ^ e* (1 + p) (»t + - 2) /oo 31 i 

(of + <^) I j I (». - 1 ) + («i - 1 ) 1 1 (1 + 6) 

is distributed as with tij + na — 2 degrees of freedom. (Cf. Example 10.17, vol. I, 
p. 248, for the distribution of a normal variate divided by a Type III variate.) 

Now if we knew 0 we could find fiducial (or confidence) limits to e, and hence to — /la, 
in the usual way, for the distribution of e would then be independent of unknown constants 
and ascertainable from “ Student’s ” integral. Since, however, 6 is not known, we require 
in turn the fiducial distribution of this quantity. Since 

is distributed in Fisher’s form (cf. Example 10.18, vol. I, p. 249), the required fiducial 
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form for 0 can be obtained from that of z, which incidentally is equivalent to that of p 
in (20.30). If we express (20.31) as the joint fiducial distribution of e and 0 and integrate 
out for 0, we shall be left with an equivalent form to that derived from (20.29). 

20.16. It also follows from the above that the inequality (20.27) is not satisfied in 
proportion a of the cases independently of 0, so that the limits to fii — /i, are not confidence 
limits, although they are fiducial limits. It will, in fact, be evident enough from (20.31) 
that if we determine to and ti so that the integral of “ Student’s ” form between those 
limits is a, then the corresponding limits for e, say e® ^-re dependent on the variance 

ratio 0 = o\la\. This is fairly evident on general grounds, and the point has been put 
beyond doubt by both Fisher (19376) and Neyman (1941a), who have worked out particular 
cases of difference. 

The fiducial distribution of e (which is an extension by Fisher of a result given by 
Behrens as early as 1929) thus provides a crucial point of difference between the theory of 
fiducial inference and that of confidence intervals. 


20.17. In conclusion, we will indicate the viewpoint of Jeffreys towards the type of 
problem dealt with by “ Student’s ” distribution for limits to the mean and Behrens’ 
distribution for limits to the diflerence of two means. 

If H denotes the general data, we have for the “ Student ” distribution — 


P {dt\ /i, a, H } 


kdt 




(»+i) 


(20.32) 


The expression on the left states the probability that t will lie in a given range di on the 
assumption that H is true, the parent mean being /t and the parent variance a*. Since 
II and <7 do not appear on the right they are irrelevant and may be suppressed, and hence 


P{dl\H} 


k dt 





Suppose now that we assume that 

P{di\x, 8, H} =f{t) dt. 


(20.33) 


(20.34) 


Then, as before, x and a may be suppressed and we have 


P{dt\H} =f{t)dt, 


and hence, by comparison with (20.33), 

P{dt I X, 8, H) 


k dt 



*(•■+ 1 ) ■ 


. (20.36) 
. (20.36) 


We can then proceed to find limits to t, given x and a, in the usual way. Jeffreys empha- 
sises, however, that this depends on a new postulate expressed by (20.34) which, though 
natural, is not trivial. It amounts to an assumption that if we are comparing different 
distributions, samples from which give different r’s and «’s, the scale of the distribution 
of (i must be taken proportional to a and its mean displaced by the difference of sample 

means. 
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20 . 18 . In a similar way it will be found that to arrive at the Behrens distribution 
it is necessary to postulate that 

P dt% I Xif 3^, Sj, (W • • (20.37) 

Jeffreys’ derivation of the Behrens’ form from Bayes’ theorem would be as follows : — 
The prior probability of djui dfjit dci da^ | i? is 

P {d^l^ dfi» datda,\H}ac . 

Ox 0% 

The likelihood (denoting the data by D) is 

fit. Ox, (ft, H) oc exp 1^- X,)* + sf} - - *»)* + «|}]. 

Hence, by Bayes’ theorem 

P{d^t df^t datdat\DH}=^ ®*P [“ Sf ^ 

2c^ ^ — ^*)* "i" J d/ll d/it doi dot. 

Integrating out the values of Ox and Cg, we find for the posterior distribution of and ^2 
a form which is easily reducible to (20.29). 

20 . 19 . To sum up : so far as concerns problems of estimation the Behrens test is 
accurate both in fiducial theory and in the theory of probability propounded by Jeffreys. 
But the test does not hold in the theory of confidence intervals. In fact the latter fails 
to provide an exact solution to the problem, though we shall see below ( 21 . 28 ) that approxi* 
mations are possible. Fisher has criticised confidence intervals on the ground that they 
do not ^ve an answer to what is admittedly an important question ; but it appears possible 
to maintain consistently that some questions may not have an answer. 

NOTES AND REFERENCES 

For the general theory of fiducial inference see Fisher (1930a, 1933, 1935a, 6, 1936c, 
1941a). The difficulties of reconciling Behrens’ test with confidence-interval theory were 
noticed by Bartlett (1936a) and led to some controversy, for which see Fisher (19376, 
1939a, 1940c), Bartlett (1939a), Yates (1939/), and Neyman (1941a). For Jeffreys’ views 
see his papers of 19376, 1938c, 1939d and 1940. 

For the practical application of Behrens’ distribution see Sukhatme (19386) and Fisher 
(1941a). Behrens himself stated his results explicitly only for the case of equality of sample 
number, rix = w*, the extension being given by Fisher (19366). 

EXERCISES 

20 . 1 . If X is the mean of a sample of n values from 

dF = — exp -[ — ^ - 1 dx, 

a^/(27l) ^ \ 2(T* J 

s** is equal to ^ 27 (a? — »)*, and a: is a further independent sample value, show that 

^ __x — X f n 
~ V tT+I 
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is distributed in “ Student’s ” form with r = « — 1. Hence show that fiducial limits 
for X are 

* ^ 

where tx is chosen so that the integral of “ Student’s ” form between — tx and tx is an 
assigned probability a. 

(Fisher, 19356. This gives an estimate of the next value when n values have 
already been chosen, and extends the idea of fiducial limits from parameters 
to variates dependent on them.) 


20 .2 . Show similarly that if a sample of rix values gives mean Xx and estimated variance 
5 ^*, the fiducial distribution of mean and estimated variance in a second sample of n* is 


(Wi — 1) -j- (^2 — 1) ^2* 

Hence, allowing to tend to infinity, derive the simultaneous fiducial distribution of 


/ nxu^ 1 
sj nx+n^] 


i(Wt+W, — 1) 


dF oc 


{ 


H and a. 


(Fisher, 19366.) 



CHAPTER 21 

SOME COMMON TESTS OF SIGNIFICANCE 


TesU of Significance 

i 21.1. We DOW pass firom the problem of estimation to that of significance. The 
two are closely allied and in practical problems they both arise together as a rule ; but 
it is useful to preserve a distiaction between them. In estimation we try to find, with 
greater or less accuracy, the value of some parameter in a population which is known to 
be (or assumed to be) dependent on that parameter. In tests of significance we are given 
some value of a parameter beforehand and wish to decide whether it is aoceptaUe in the 
light of the evidence. This is the distinction in its simplest terms, but of course the 
associated problems become increasingly complex when several parameters are concerned. 

21.2. From one point of view the problem of significance is logically anterior to that 
of esfamation. Suppose we have records of the yields of two varieties of wheat grown 
under similar conditions, and are interested in a comparison of the average yidids of the 
two. Our first question is whether the observed mean yields indicate any difference between 
the varieties — a matter of significance. Not until significant differences are established 
does our interest turn to the magnitude of the difference — a matter of estimation. Again, 
if we have a set of records of only one variety, our primary problem may be to decide 
whether they are consonant with the hypothesis of normality in the parent population, 
whatever its mean and variance ; and only when this point has been settled afiSrmatively 
do we proceed to estimate those parameters. 

Nevertheless, we have lost very little by taking the problem of estimation first. In 
some practical problems the question of significance is already decided, and in many others 
we use estimates of parameters to test the significance of the latter, in which case estimation 
and significance become different aspects of the same statistical fact. 

21 .3. We shall consider the general theory of testing statistical hypotheses in Chapters 
26 and 27. That theory is, however, rather abstract, and we anticipate it to some extent 
in this chapter by giving an account of the principal tests in current use, without for the 
moment going too deeply into their rationale. It will be seen later that there are sometimes 
many significance tests which can be applied to the same problem, and that it is possible 
to lay down criteria for deciding which, if any, are the “ best ”. This aspect of the subject 
will not concern us for the present. We shall not discuss whether the tests we describe 
are the best possible (though some of them, in fact, are so) but shall merely present them 
as useful and convenient, albeit perhaps not unique, solutions of our problems. 

21.4. Developments in statistical theory in the last two decades have resulted in 
a great many tests of significance appropriate to special problems. It is not easy to classify 
them and quite impossible to deal extensively with them all. We shall consider them 
imder the following heads : — 

(a) Tesfo of the significance of a specified parameter value. — The typical hypothesis 
here is that a parameter in a population of known form has a specified value (usually 
. zero). We wish to know whether the evidence provided by the sample supports thb 
hypothesu or not. 
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• 0 / goodness of fit. — ^The hypothesis is that the population is of a certain 

Imd which is either fully specified beforehand or can be “ estimated ” with the help 
of the data. We wish to know whether the sample values fit this population in the 
sense that they could have arisen from it by random sampling to any acceptable degree 
of probability. This hypothesis is more general than that of (a) since it concerns 
the whole distribution function and not merely one of its parameters. 

(c) Tests of homogeneity. — ^The hypothesis here concerns two or more populations, 
each providing a contribution to the sample. We wish to test whether the populations 
have certain parameters in common, or in the extreme case, whether they are identical. 
This case can be regarded as an elaboration^ of (a) where several parameters are simul- 
taneously tested. In the particular case when only two populations are concerned 
we may sometimes reduce it directly to type (a) by considering differences ; e.g. if 
we are making a comparison of parent means the hypothesis might be that the single 
difference of means is zero. 

In addition we shall also consider two sets of tests of rather a different kind : — 

(d) Tests of order of occurrence. — The hypothesis here is that the sample members 
occurred in random order, and we wish to ascertain whether the observed order indicates 
any systematic effects, as, for instance, whether there are any cyclical effects in time- 
series. The test here is of the sampling process rather than of parameters of the 
parent population. 

(e) Conditional tests. — The hypothesis may be any one of the above types, but 
we restrict the inference to a sub-population for which certain qualities are deter- 
mined by the observed sample values. For instance, we may use the distribution 
of the sample variance for which the mean x is equal to the observed value. In 
short the variation of sample values is conditioned. Type (d) may from some points 
of view be regarded as a particular case of this type. 

It is not intended to convey that the above five categories are mutually exclusive. 
A test of type (a) may, for example, be conditional or non-conditional. The classification 
will, however, provide some sort of articulation for a rather long chapter and serve to 
explain our sequence of treatment. 

Standard Errors 

21.5. For large samples the test of significance of a parameter can usually be carried 
out by standard errors. We find an estimator t of the parameter B and consider whether 
the given value of 0 falls in the range ti ± fcV var t, where ti is the value of t for the observed 
sample and ifc is a constant chosen at will according to a probability a. If so we may accept 
the value of 0, at least so far as this test is concerned ; if not, we reject it. 

If the variance of t does not depend on unknown quantities such as other parameters, 
this type of inference is justifiable as an application of the theory of confidence intervals. 
In accepting 9 when it falls in the range ti ± ky^vsnt, we shall be right in proportion a of 
the cases in the long run. As a refinement we may, of course, use non-central intervals 
and locate 0 in an asymmetrical range ti — t to ti kiy/v&r t. The test of signifi- 

cance is equivalent to the estimation of the true value of 0 ; and it will clearly be better 
if the range of estimation is narrower, for then we reject more wrong values of 0. 

21.6. If the variance of the estimator t depends on unknown parameters 0, . . . 0p 
we can usually substitute estimates of those parameters obtained from the sample itself, 

A.S.-^VOL. n. ® 
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provided that the sample is large. For example, we have for normal samples 

The sample standard deviation a will differ firom or by a quantity of order bo that 

to that order 

pjjM <x + = 0-97726. 

The approidmation breaks down for small samples, and more accurate methods are required. 

21 .?• The use of standard errors in testing significance has been illustrated in previous 
chapters, and we need not enlarge on the process further. We may, however, remark 
two things : — 

(a) That if the distribution of an estimator t tends to normality for large samples 
irrespective of the parent form (as, for instance, is the case with the mean and other moments 
under very general conditions), it is not necessary that the h 3 q>othe 8 is should specify the 
parent form. In short, our test of significance is independent of the parent, a valuable 
generality which rarely obtains for small samples. 

(b) That we have justified the logic of reasoning involving the use of standard errors 
by the theory of confidence intervals (and a similar justification can be given in terms 
of fiducial intervals if we use an efficient estimator for which the loss of information tends 
to zero relative to the total information in large samples). This appears to be the most 
satisfactory basis for the use of standard errors. The usual intuitive basis advanced 
(necessarily) in introductory textbooks is not easy to defend. For instance, it is customary 
to reject a value of 0 if it gives to an observed or greater value a small probability ; and 
there is lio obvious reason why we should base our inferen^ on the improbability of greater 
values of ti, namely on the improbability of something which has not occurred (see 21.55 
below). Our present approach shows that in fact the use of standard errors can be justified 
logically without invoking a new principle of inference. 


Significance of the Mean in Normal Samples 

21.8. Suppose we have a sample from a parent population which is known to be 
normal, but of whose mean and. variance we are ignorant. We wish to test the significance 
of a given value of the mean, that is to say, we wish to consider whether the observations 
could, to any acceptable probability, have been derived from a population with mean 
whatever the variance may be. 

We calculate the statistic 

t = LT-tl y/v, (21.1) 

s 


all the quantities in which are given. We know that the distribution of ^ is 


dF 


VMr(l)(i+e)¥' 


( 21 . 2 ) 


and hence can find the probability that our calculated value of < is attained or exceeded. 
H this is small we reject ; if not, we accept it. What values are regarded as “ small 
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for this purpose is a matter of convention, but the most frequently used values are 0*06, 
0*01 and 0*001. 

From the work of the previous two chapters it will be evident that this type of infer- 
ence is the confidence- or fiducial-interval approach in a slightly different form. Given 
a we can find — ti and such that the integral of dF in (21.2) between those limits is a. 


^ 0 ^ 


This gives us confidence or fiducial limits to ^ of the type ^ ;0, ^ 


and if 


^ Vv’ 

/lo lies in this range we accept it. In particular cases we may have U = tu in which cases 
the intervals are central and our probability a is the chance of t being attained or exceeded 
in absolute value ; or io == + in which case a is the chance that — will be attained 
or exceeded, and no lower limit to is imposed. 


Example 21,1 

The weights of fifteen bags of sugar taken from a filling machine are found to be, in 
ounces, 16*1, 16*8, 16*8, 15*9, 16*1, 16*2, 16*0, 15*9, 16*0, 16*7, 16*7, 16*8, 16*0, 16*0, 16*8. 
Each bag should be 16 ounces, but some deviation is inevitable. One of the manufac- 
turer’s problems, of course, is to keep this deviation to a minimum, but that is not the 
point we now consider. Our question is : if the machine is supposed to be giving weights 
of 16 ounces on the average, docs the sample suggest that it is failing in its purpose ? 

The hypothesis is that the parent mean is 16 ounces and the deviations from this 
mean are, in order of magnitude, — 0*3 (twice), — 0*2 (four times), — 0*1 (twice), 0*0 
(four times), 0*1 (twice), 0*2 (once). The sample mean is thus — 0*08 and to that extent 
the average of the sample is slightly underweight. Is this a significant effect ? 

It will be found that ~ 0*0216 so that 


. _ _ 0*08 ... _ 
V'()*0216^ 


2*04, 


= 14. 


From Appendix Table 3 (vol. I, p. 440) we find that for v == 14 the probability of a deviation 
greater in absolute magnitude than 2*04 is about 2 (1 — 0*969) = 0*062. This is small, 
but whether we regard it as significant or not depends on the probabilities we are prepared 
to consider as defining significance. The usual values are 0*06 and 0*01, and with such 
criteria we should not take the observed value as significant, though it arouses suspicions. 

We have here used central intervals, which are usual for the ^-test of significance 
of the mean ; but it is easy to imagine circumstances in this particular case for which 
non-central intervals might be required. For instance, if the machine was at fault and 
had a true mean filling weight of more than 16 ounces the manufacturer would be giving 
sugar away for nothing. This might be serious, but probably not so serious as if the 
machine was erring in the other direction, which would redder him liable to prosecution 
for selling short weight. Suppose he assessed the latter risk as nine times as serious as 
the former and was working to a probabDity level of 0*06. Then he would require 
the probability of a negaiive value of t greater than the significance value to be 
0*966 ( = 1 — 0*046) but could allow that of a positive value less than the significance value 
to be 0*996 ( = 1 — 0*005). From Appendix Table 3 we see that this corresponds to 
deviations of approximately — 1*8 and + 3*0. Our observed value is outside this range 
and is thus significant. Small as the average shortage is, it would be prudent to overhaul 
the machine and to make sure that it is giving fair weight on the average. 

We may note further that if the sample had occurred in the order 

16*7, 16*7, 16*8, 16*8, 16*8, 15*8, 16*9, 16*9, 16*0, 16*0, 16*0, 16*0, 16*1, 16*1, 16-2 
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we should almost certainly have concluded that there was something wrong with the 
machine, for the weights are steadily rising. The t-test would give the same result for 
this sample as for the first, since it does not depend on the order of occurrence of the mem- 
bers. Where, .therefore, the appearance of individual sample members is ordered in time, 
the f-test alone may fail to reveal significant effects due to the changing of the population 
between drawings. Our data are still such as could have arisen at a single drawing of 
fifteen members' from a population with mean equal to 16 ounces *, but the data throw 
doubt on the point whether we are really asking the right question m assuming that they 
all came from the same population. We consider the point again below (21.41). 

Before leaving this example, we may note another possible test, cruder than the ^test 
but sometimes useful. If the parent mean were really zero, positive and negative devia- 
tions should occur equally frequently in the long run. In our present case there are 8 
negative deviations, 3 positive ones and 4 zero. If we allot, conventionally, two of the 
last to each group we have 10 negative and 5 positive deviations. The expected number 
is 7^, BO that the deviation is 2|, with a standard error of V(l^ x i x i) = The 

observed deviation is very little in excess of this, so we conclude that the preponderance 
of negative signs in the sample is not significant of a negative mean in the population. 
More exactly, we find that the occurrence of 5 or fewer positive deviations is the sum of 
the first six terms in the binomial namely 0*151, leading to the same conclusion. 

The test is a very rough one since it pays no attention to the magnitude of the deviations ; 
but it has the advantage of applying to any symmetrical form of parent population for 
finite samples. 


Properties of (he t-Distribviion 

21.9. “ Student’s ” distribution has numerous applications in the testing of signifi- 
cance apart from the one just considered, and we proceed to study its properties. 

The form (21.2) is a Pearson Type 'VII and may be transformed to the Beta-distribution 
(Type I) by the substitution ^ — ij^l +^-^. The distribution function of t may thus 
be obtained direct from the B-function. For instance, we have 


whence 


whence 






2F - 


2 f< 1 dt 

. . . 


(21.3) 
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lof 

The values of the argument for which I has the values 0*50, 0*26, 0*10, 0*05, 0*025, 0*01, 
0*006 and r == 1 (1) 30, 40, 60, 120, oo, have been tabled to five significant figures by C. M. 
Thompson and others (1941a) and can hence be used to derive the values of t corresponding 
to ibhose probability levels. 


21 .10. Except for special purposes, however, the use of the B-function is unnecessary, 
since the distribution function of t itself and tables based thereon are available. 

We have 

(- 
2v* 


■ + 


+ • 


1 f y ^ 

-log^l+-j = --+^, • • • ■ -p 

’l+Iiog (i + = _ J,* + . . . + 0- + 1) 


and hence 


2i(i + i)v# 

Further, from the expansion for logF(l -fa;) we find 


+ 


“1 Ki) 


l+_i _ J_ 

4r 24v* 20i^ 


(21.4) 


( 21 . 6 ) 


Now as V tends to infinity, t tends to the normal form with zero mean and unit variance. 
Writing 

y — — - — 

V(2^) ’ 

we find for the logarithm of the ordinate of (21.2), in descending powers of r, 
hgy + l- («< - 2C - 1) - ji. (SB* - .K*) 4 5^. (SI* - «• + 1) 


40i; 


L. («>" - «•) + 5i^ («“ ■- - 3) 


( 21 . 6 ) 


Taking the exponential and integrating from {to oo, we find 

^ _ 6{*~ 3){ + - 11{- 

+ 14{* + 6{« - 3{* - 15)< + (16«“ - 376{‘* + 2226<« - 2141{ » 

- 939{« - 213{« - 916{* + 946) { + . . .| . . . . (21.7) 

This is the expression, due to Fisher, which was used by Student ” himself in calculating 
the distribution function of t given in Appendix Table 3, Vol. 1. For values of v > 18 the 
first four terms of (21.7) give F to an accuracy of about 0*000,005. 


21.11. Tables are also available in the inverse ” form, that is to say, giving values 
of t corresponding to specified values of v and F. Such tables may be derived by inter- 
polation from the Student ” tables or by the normalisation method of 6.32. In work 
involving tests of significance this type of table is perhaps the most convenient, since it 
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enables one to decide without calculation (oGier than interpolation for values of the 
argument not covered by the tables) whether particular values are significant for chosen 
probability a. The complement of the probability a is spoken of as a level of significance 
and expressed either as a number between 0 and 1 or as a percentage. Similarly the 
corresponding values of t are called significance points, and we may speak, for example, 
of the 6 per cent, value of t, meaning that value for which F is 0*96. 

Fisher and Yates (1938a) give the values of t for v = 1 (1) 30, 40, 60, 120 and oo and 
2 (1 — F) = 0-9 (0*1) O'l, 0*06, 0‘02, 0-01, 0-001. These tables, it should be remembered, 
give the significance points corresponding to ^’ce 1 — F, that is to say the values of t 
such that the proportion of the distribution outside the range d: ^ is 1 — F. 

21.12. The number v is usually called the number of degrees of freedom of t. This 
is an expression which occurs in otW connections, and a few words of explanation are 
desirable. 

It has been seen that the variance of a normal sample is distributed like the sum of 
(» — 1) squares of independent variates (compare Example 10.6, vol. I, p. 238) and gener- 
ally, that if there are k linear relations connecting the original variates, the sum of squares 
of the , originals is distributed as the sum of n — k independent normal variates of equal 
variance. Each linear relation reduces the freedom of the variation, as it were, by unity. 
It is thus natural to speak of the number of degrees of freedom, v, of a function such as 
X*, meaning thereby that it is distributed as the sum of squares of v independent 
normal variates with equal variance. The expression only has this natural meaning when 
normal variation is concerned. 

It so happens that the quantity t depends on a parameter v which is convenient for 
tabulating its distribution function and is also the number of degrees of freedom of the 
statistic s* entering into the denominator of t. v may thus, by an extension of the term, 
be called the number of degrees of freedom of t, but this usage does not imply that t is 
distributed as the sum of squares of normal variates. 

Distr^mtion of t in Non-normal Case 

r X 21.13. Part of the price we have to pay for the precision of the ^test in small samples 
ins the assumption of normality in the parent. If the population is not normal we may still, 
I* of course, consider the distribution of “ Student’s ” ratio, which will remain independent 
of the scale parameter ; but complications appear because the parameters which express 
the deviation from normality will, in general, appear in the sampling distribution. Further- 
more, the distributions of x and s are no longer independent. 

Let us in the first instance prove the last assertion which is due. to Geary (19366), 
in the form : If the mean and variance in samples from a population are independent 
and the population has finite cumulants, it must be normal. 

From 11.13 we have 


k( 21'-)=^^, r>0. 

w 

If mean and variance are independent, k (21*) = 0 and hence = 0 for r > 0. Thus 
the population must be normal. It is rather remarkable that we have not had to use 
relations of the t 3 q)e k (2* 1*) = 0, s > 1 in thriving at this iesult and that we need only 
assmne independence for one size of sample. 
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21.14. In the notation of Chapter 11 we write 



and expand in terms of powers of K The method follows that of 11.23 and we 

find for the moments of t about the parent mean, assumed zero, to order 
Ml = — + 5AsA« I • 

= 1 + H (1 + ^) + 4- (3 - A, - 3A, A. -h 6Ai A.) 

V 1 ** 

Ms = - (210A, - 66As + 106A, A« + 210Ai)| ‘ ' 

= 3 + ? (9 - A, + 14A*) + 4 (102 - 3OA4 + 24A» 

V 

+ 120A2 -t- 4Ae I32A3A5 - 6Af + I68AIA4 + 120AJ) 

where A^ = — 

If the parent form is symmetrical, cumulants of odd order vanish and we have, to 
order v~ ^ and first order terms in the A’s — 


jn[ = = 0 


. ., 2,6 2 A 4 

/^2 = 1H h ~= ^ 


V - I 2A. 

V — 'i r* 


(21.9) 


- 3 -L 1? i ^ ^ 3 (v -- 1)2 _ 2A4 _ 30^ 

V V (v — 3) (v — 5) V v* 

Except for the term in A 4 these are the values of the moments of ^ in “ Student’s ” dis- 
tribution, and it follows that for symmetrical parents which are not excessively lepto- 
or platykurtic we should not expect the ^test to be invalidated. If the parent is skew 
the situation may bo different. 


21.15. The general skew case has been considered by £. S. Pearson and Adyanthaya 
(1928, 1929) from the experimental viewpoint and by Bartlett (1935a) and Geary (19366) 
from the theoretical viewpoint. Various writers have derived exact distributions of t 
in non-normal samples, but the sample numbers are, as a rule, trivially small and the 
results of little practical value. Geary considers the population expressed by the first 
two terms of the Gram-Charlier series — 


dF = -4r 1 1 - ^ (3a: - **) 1 c-*®’ dx . . . (21.10) 

yzTt 6 J 

and assumes that powers of ic, above the first may be neglected. He finds (cf. Exercise 
21.1) that the frequency function of t in this population is equal to the ‘‘ Student ” form 
plus a corrective factor 




6v V{2^(v + 1)} 


{3v~ e*(2v + 1)} 



tdt 

t2\ 


(21.11) 
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The integral of this factor from — oo to — t is 


•U 

6 




2v + 1 

V 



( 21 . 12 ) 


giving the correction to be applied. (Geary gives a table for some representative values.) 
This, of course, depends on /c,, but even where exact knowledge of the skewness is not 
available we may sometimes safeguard against error by considering the correction for 
plausible values of k^. 


Other Uses of the t-distributiort 

21.16. The usefulness of “ Student’s ” t derives from the fact that it is independent 
of the scale parameter, and the simplicity of its distribution from the fact that it is the 
ratio of two independent variates, the numerator distributed normally and the denominator 
distributed in the Type III form. We shall see below (21.26) that these properties can 
be used to test the difference of two means in normal populations with equal variance, 
and in Chapter 22 we shall encounter a test of regression coefficients which is based on 
the same properties. 

We have also noted that “ Student’s ” form can be used to test the significance of the 
product-moment correlation (14.15) and the Spearman rank correlation p (16.18). These 
facts are, however, in a sense accidental. They do not derive from the expression of the 
parameters concerned as the ratio of a normal to a Type III variate, but from the simpler 
fact that the distributions are of the Type II form (symmetrical with finite range) and 
hence can be transformed to the ‘‘ Student ” distribution, which is of Type VII. Sym- 
metrical distributions of finite range can often be represented very approximately by a 
transformation to the “ Student ” form, especially if they tend to normality. 


Test of a Variance in Normal Samples 

21.17. The distribution of the sample variance in normal samples is 


dF 







0 < s < 00. . (21.13) 


Thus, given for consideration a value of or* and an observed we can find the probability 
that is attained or exceeded and accept or reject a* in the usual way. The distri- 
bution fimction of (21.13) may be expressed as an incomplete jT-function, or more con- 
veniently for statistical purposes in terms of (= ns^/o^) with v == n — 1. 


Example 21.2 

In Example 21.1 we found s^ = 0-0216, v = 14. Could the data have arisen by chance 
from a population in which the true variance is 0-01 ? 
ns^ 

We have x^ = = 32-4, v = 14. Prom the diagram on p. 446 of vol. I we see 

that the probability of such a value or greater is between 0-01 and 0*001, a very improbable 
result ; and hence we reject o* = 0-01 as a value of the parent variance. 

Once agam this type of inference can be justified by the theory of confidence intervals 
since the probability 

> 32.4j < 0 01 
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is equivalent to 

In asserting that cr* was less than ns^/^2-4: (in our present case 0*01) we should be wrong 
more than 99 times in 100 on the average. 

There is a point of interest to note here. In Example 21.1 we considered a hypothesis 
as to the mean fi, and in the present example a hypothesis as to the variance Had we 
considered the two together, that is to say the compound hypothesis that ^ 16 and 

or* = 0*01, we should have been in difficulties in justifying our procedure by reference to 
confidence or fiducial intervals, since we could no longer assert that our conclusions were 
right in an assigned proportion of cases. We have avoided this complication by con- 
sidering separately the hypotheses (a) that = 16 tvhatever the variarice, and (b) that 
a* = 0-01 whatever the mean. This resource is not as a rule open to us where non-normal 
variation is concerned. 

Tests of Normality 

21.18. In large samples we can group the data into ranges and compare the actual 
frequencies with those to be expected on the hypothesis of parent normality. This com- 
parison over the course of the frequency function is not satisfactory for small samples 
unless the grouping is so broad as to deprive the test of most of its efficacy. An alter- 
native is to compute some statistic of the sample and to examine how far it departs from 
the mean value to be expected on the hypothesis of parent normality. 

Consider, for instance, the statistic 

( 21 . 14 ) 

This is independent of the mean (because the fc-statistics are so) and is also independent 
of the scale parameter because it is “ studentised In normal samples, therefore, the 
distribution of t is independent of mean arid variance and thus depends only on the sample 
number n. We have already given formulae for its mean and variance (Exercise 11.16, 
vol. I, p. 289). In fact, 

(0 = IH (0 =0 I 

6n(i.~l) \ . . . .(21.16) 

^ ^ (n - 2) {n + i) {n 4* 3) J 

Since the distribution of t is symmetrical we may, for moderate n, consider it as normally 
distributed with zero mean and variance given by (21.15), and this will provide a test— 
of a somewhat approximate kind — of normality in the parent from which the sample is 
derived. 

Example 21.3 

In the data of Examples 21.1 and 21.2 we have, for the sample moments about origin 
16, in units of 0*1 

mi = — 0*8 
m* = 2*16 
m, = 0*496 
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whence 


kt = — ^ m, = 2-31429 

ft 1 




»• 

(n - 1) (»“-~2) 


iw-3 = 0*61319 


and « = *L = 0-174. 

The variance of t, from (21.15), is 0-3188 and its standard error accordingly about 
0-57. The observed deviation from zero is considerably less than this, and we see no reason 
to doubt the hypothesis of normality so &r as this test is concerned. 


21.19. Another test of normality has been proposed by Geary (1935a), namely 
the use of the ratio 


w = 


mean deviation 
standard deviation' 


. (21.16) 


If the parent mean is zero, the parent value of w is / - = 0-79788. The test has also 

\ 7t 

been adapted to the case when the parent mean is not zero, and tables provided for the 
application of the test (Geary and Pearson, 1938). 

Geaiy’s ratio is directed towards detecting deviations from mesokurtosis in the parent. 
The criterion based on kjkl, which is a natural extension of that for skewness based on 
kt/k^, is not very suitable for the purpose, since it has a skew distribution for quite high 
values of n. The distribution of Geary’s ratio tends to normality fairly rapidly 
(cf. Exercise 21.2). 


Testa of Goodness of Fit 

21.20. In Chapter 12 we considered in some detail the use of x* hi testing corre- 
spondence between observation and hypothesis. K the hypothesis specifies the theoretical 
values completely no question of estimation arises, and each cell contributing to could, 
if so desired, be tested separately. Prom this point of view compounds into a single 
test a number of tests of the kind already considered. 

If the hypothesis does not specify the theoretical values completely, but leaves them 
to be estimate in part from the data, some modification in the ;K*-test is necessary. We 
can now establish a result which in 12.13 was announced without proof : if the estimators 
em^iloyed are maximum likelihood estimators, then for large samples the ;i'*-test of signifi- 
cance retains its validity, provided that the number of degrees of fireedom is reduced by 
unity for every parameter estimated. 

Suppose the hypothesis leaves unspecified a parameter $, and let f be its m axim um 
likelihood estimator. Then if the theoretical frequencies based on the true value of 9 
are A and those based on t are A', we may write 

(21.17) 

A 


A' ' 


. (21.18) 
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^ is distributed as the sum of squares of v normal variates udth unit variance. The problem 
is to find the distribution of ;u'*. We have 





and for large samples the difference between A and X' will be of order n~^. 
expanding the difference in terms of dd, to order n~^, 

. X X' x'^ dd \x'»\ de ) r* ae*/ 2 


+ 


We then have, 


(21.19) 


Now for large samples the maximisation of the likelihood is equivalent to minimising x*i 


and hence 



and 

VA* 90/ 



2/9rY_^i 
* * 2 \A'\90y 90* J 




. (21.20) 


But the sum on the right is the reciprocal of the variance of the maximum likelihood esti- 
mator, and writing St for 36, as is legitimate for large samples, we have 

( 21 , 21 ) 

var t 


The quantity on the right is itself the square of a variate which (in the limit) is normal 
and has unit variance. Furthermore, its distribution is independent of that of x'^- 
consider the spherically symmetric density-distribution of the v normal variables whose 
sum of squares composes x^’ Let O be the origin and P any point ; then x* — OP^, Now 
for large samples the variation takes place in the neighbourhood of O. A surface of con- 
stant t through P is approximately plane in the effective range of variation. If OQ is the 
normal to this surface, 

OP* = OQ* + PQ^ 


corresponding to 


= 


var t 


+ Z'*> 


for t is chosen so as to minimise x'* — PQ*- Thus if we take f as a new co-ordinate, together 
with (v — 1) others in the surface of constant t, the axis of t is orthogonal to the space of 
constant t, and t will be independent of x'*- 

It follows fiirther that %'* is distributed as the sum of (v — 1) squares of normal 
variates. Thus the usual Type III distribution of %* holds for v — 1 degrees of freedom ; 
and so for every constant fitted, with a reduction of unity in the number of degrees for 
each constant. We have already exemplified the use of the result in Example 12.4 (Vol. I, 
p. 301). 


The oi^-diatrUmiim 

21.21. For small samples the ;i;*-test is difficult to apply, since it depends for its 
validity on the fact that the binomiid distribution in individual cells may be represented 
by the normal distribution, and hei^ce requires that cell-frequencies shall not be small. 
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A test of a different kind has been proposed by Cramer (1928) and independently by von 
Mises (1931). 

Put 


w‘ 


= r {F{x) -F{x)}^dx, 
J —00 


. ( 21 . 22 ) 


where F (x) is the observed distrilneUon fiinction and F (x) the hypothetical distribution 
function. The quantity varies from sample to sample, its mean value being 


^(ft,*) = i.r F{x){l-F{x)}dx = }-Au . . .(21.23) 

n J 

where is Gini’s coefficient of mean difference (cf. 2.24). For 

i?(co*) = [ S{F-F}*dx. 

J —00 

For any given x the expectation of (F — F)* is merely the variance of the proportion F 

p jp\ 

and hence is equal to - . The result (21.23) follows at once. 

The ft)®-test consists of comparing the observed with the mean value ; but it is not 
possible to express the comparison in terms of probability as the sampling distribution 
of CD^ is not known. 


21.22. The numerical evaluation of the integral (21.22) is tedious in the case of a 
continuous distribution, and Wold (1938a) has suggested a modification. If the variate 
range is divided into intervals at — cx), ajj, a?* . . . . . . oo, we define 


w^^i: {F (x^) - F (a;^) (21.24) 

i 

If the intervals are all of width h, 

E(w») = ^r F{x){l-F(x)}dx + -B, . . .(21.26) 

na J _oo ^ 

where i2/„ is a remainder term. If this maybe neglected, the ti;®-test is equivalent to the 
co*-test but easier to apply. If the data are ungrouped, the a;/s may be taken at equidistant 
intervals. 

In the particular case when F is normal, we have 


n E (a>*) = f f ■ } ■ ■■ f du dv dx. . 

J-oo J_oo V(2^) Jap 

Putting u OL + X and v = j8 + a;, we find, after integration with respect to x, 
A further substitution of y = a — /J and d = a + gives 


(21.26) 


1 


fv r 


»QO 
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21.23. An interesting modification of the cu^-test has been given by Smirnoff (1936) 
-who defines 

m‘ = [{F-F)*dF (21.28) 

The difference lies in the differential element which has the effect of rendering 
the distribution of co^ independent of F. It is shown that as n tends to infinity the distri- 
bution function of co^ tends to the form 


i/(- 2 sin 2)’ 


(21.29) 


but this does not look a very promising formula for application in particular cases. 

Cram6r (1928) has extended formula (21.27) to the goodness of fit of Gram-Charlier 
series and gives some examples of fitting to observed distributions. 


Difference of Two Hearts 

21.24. A common case occurring in practice is that of two independent samples of 
nx and Uz members from two populations whieh may or may not be different. We wish 
to decide whether the evidence indicates a significant difference between the parent means. 
This situation forms a kind of border-line case between the testing of a prior value of a 
parameter and the homogeneity tests which we shall consider below. It is a test of homo- 
geneity in the sense that we are to discuss the question whether two populations are equal 
in certain respects ; but wo do not necessarily assume that they are identical, and in any 
case we can regard the problem as equivalent to the testing of a single parameter (the 
difference of the means) to see whether it is different from zero. 


21.25. For large samples we discussed the question in Example 9.10 (VoL I, p. 226) 
and gave two tests. If the hypothesis is that the parent populations are identical (a true 
hypothesis of homogeneity) we may pool the samples to form a single sample and test 
whether either mean differs from the mean of the total. If, however, we wish to test the 
less general hypothesis that the parents have the same mean but not necessarily the same 
variance, we may test the difference of means by the ordinary equation expressing the 
variance of a difference in terms of the separate variances. This is not a homogeneity test 
in the strictest sense of the word, but tests of such a character may conveniently be dis- 
cussed in conjunction with the other type, both for small and for large samples. 


21.26. We now consider the corresponding problem when the samples are small 
and the parent populations are assumed to be normal. In the first place we take the 
case when the two populations have the same variance cr*. 

The sample means and Xz are distributed normally with variances — and — and 


means /ij and Consequently 




Xt — (fix — flz) 


is distributed normally with variance 


— + —t and hence 

Ui Uz 


if^t fit) 




ni fit 
»i + »i 


a 


. (21.30) 
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is distributed normally with unit variance about zero mean. Further, if jSf and Si are 
the sample sums of squares about the mean, the quantity 

^==^(-81+ -S?) (21.31) 

is distributed as with »i + »a — 2 degrees of fieedom, independently of the expression 
(21.30). It follows that 

u = (/*! - Mt) / f nt nt (til + tit- 2) 

S V 1 + »i 

is distributed like “ Student’s ” t with v = tii + tig — 2 degrees of freedom. This expres- 
sion does not contain the unknown a and hence may be used to test the difference /Ug — fig. 
This result is due to Fisher (1926a). 

Example 21.4 

In a class of 20 children, 10 chosen at random were given a ration of orange-juice 
each day for a certain period and the other 10 a ration of milk. Their gains in weight 
during the period were, in pounds : — 

First group : 4, 2J, 3^, 4, 1^, 1, 3J, 3, 2^, 3^ 

Second group : IJ, 3J, 2^, 3, 2|, 2, 2, 2J, IJ, 3 

The mean increase in the first group is 2*9 pounds, and in the second 2*4 pounds. Putting 
aside other explanations, one possible factor accoimting for this difference is the difference 
in treatments. But we wish to know in the first place whether this is significant. We 
assume, then, that treatment exerted no differential effect and that the -samples came 
from normal populations with the same mean and variance. We find 

Xg = 2.9 Xg — 2-4 

E {Xg — Xg)* — 9*4 E {Xg — Xg)* = 3-9. 

Hence, from (21.32), with pg — pg = 0, 

V = 10 + 10 - 2 = 18 



From Appendix Table 3 (vol. I, p. 441) we see that such a value would be exceeded in 
absolute value with probability 0-21. The difference of a half-pound between the sample 
means is not significant. 

We note incidentally that the sample vfuiances, 0.940 and 0*390, differ considerably, 
and shall see below how the significance of the difference may be tested. At the present 
stage our conclusion as to the non-significance of the difference of means is to be regarded 
with reserve, for the data themselves suggest that we have over-simplified the problem 
in assuming equal variance in the two populations. 

21.27. Apart from the question of unequal variances, the data of the previous 
example will serve to illustrate a further point of interest. Our hypothesis is that the 
children within each group may be regarded as a sample from a population -with^he same 
mean. Had we been dealing with a sample of, say, seedlings grown from the seed of a 
single plant, this hypothesis would not have been unreasonable ; but children differ very 
much among themselves in nutritional standard, and so forth. Our hypothesis is again 
liable to over-simplify the problem. 




DIFFERENCE OF MEANS WHEN VARIANCES ARE UNEQUAL 111 


When the statistician can direct the sampling himself, this kind of problem can be 
tackled with success by pairing. Suppose we select children in pairs of the same sex, 
each pair resembling each other as closely as possible in all the factors which might influence 
the experiment such as age, weight and nutritional standard. We allot at random one 
member to the first group and one to the second, and so for each pair. The differences 
in weights gained between members of a pair may then be regarded as samples from 
a population with zero mean, even if the pairs differ among themselves, and the set of 
differences tested in the usual way. 

Example 21.5 

Suppose that, in the previous example, the data had related to 10 pairs of children, 
thus : — 


No. of Pair. 

First Group 
wt. in lbs. . 

Second Group 
wt. in lbs. 

Difference, 
First - Second. 

1 

4 

H 

2i 

2 

21 

H 

- 1 

3 

H 

2i 

1 

4 

4 

3 

1 

6 

li 

2i 

- 1 

6 

1 

2 

- 1 

7 

3i 

2 

u 

8 

3 

2t 

i 

0 1 

1 2i 

li 

1 

10 

3i 

1 3 

j i 

i 

Totals 

29 

1 24 

5 


1 _ 


For the values in the last column wo find 

X = 0-6 == 1*26 V = 9 


t 


0-6 

VI -26 


V9 = 1-34. 


The probability of obtaining such a value or greater (absolutely) is about 0-22, and 
the observed differences are therefore not significant. This is the same conclusion that 
we reached in Example 21.3, but it would not have been surprising had the conclusions 
differed, for they relate to different questions. 


Difference of Means when Variances are Uneqtial 

• 21.28. When population variances are not assumed equal the ^test of difference 
of means no longer applies. We can, if we choose, apply a test based on fiducial intervab, 
namely, the Behrens test, considered in the previous chapter. We put 


d = 


. (21.33) 


The fiducial limits of d for various significance levels have been tabulated by Sukhatme 
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<19386) and Fisher (1941a) for ni and greater than 5. If the observed d falls inside the 
range, we may accept the hypothesis that the population means are equal. 

. 21.29. As we have seen, an inference of this kind does not imply that we shall he 
eorrect in a certain proportion of the cases, and if we wish to find a test satisfying such 
a criterion a different approach is necessary. The following investigation is due to Welch 
(19386). 

Consider the distribution of u of equation (21.32) when the means are the same but 
the variances are different, i.e. 


*1 — 


Put 


+ », — 2 »i/ J 

»i/ 

OiZi +<^X\ 


w = 


(«i + », — 2) 


(i+d) 

\?ii n,/ 




( 21 . 34 ) 

( 21 . 35 ) 

( 21 . 36 ) 


where aj x\ = and hence X\ ^ distributed as x^ wii^h = ni — 1 degrees of freedom, 
and similarly for x\- X* regarded as a single normal variate with zero mean and 

unit variance. We have then 


u^JL 


Now put 

where, from (21.36), 


y/w 

«» = «Zi + 


. (21.37) 
. (21.38) 


a = 


6 = 



Til Tit 

Ui + — 2 



rii 



1 

.1 1 


— + - 

ori 

ny 


ni + — 2 

?! + ^ 


Til 



( 21 . 39 ) 


w itself is not distributed in the Type 111 form unless Vi — (7t> but we will find a distribution 
of that form which approximates to it by equating lower moments. The first two moments 
of w, being the sum of the separate parts, are 


The moments of 


jwi (w) = ovi + 6v, "I 
fit (w) = 2 (a* Vi + 6*v,)J * 


(21.40) 


dF = 


(2sr)*'r(iv) 




are 




gv 
2g*v 




(21.41) 
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Identifying (21.40) and (21.41) we find — 


a® Vi + 6* Va" 

avi + feva 
(avt + fcy>)^ 

a* Vi + 6* Va- 


. (21.42) 


With these values of g and v the distribution of w/g is approximately of the Type III form 
with V degrees of freedom and will be independent of x'- Hence, 


Iw 
V ^ 





(21.43) 


is distributed approximately as ‘‘ Student’s ” t with v degrees of freedom. In particular, 
if d ^ Giy a = b and we reduce to the test of 21.26. 


21 . 30 . 

0 = al/G% 


In general, when Gi Ua the quantities g and v depend on the ratio 
We have 


{ vi 0 + ^2)^ 
V1.O* + Va 


(21.44) 


and may put u = ct where c = l/y/vg, and hence 


c 



(21.45) 


Without a definite knowledge of 0 we cannot apply the ^-test, but the advantage of putting 
the expressions in this form is that by considering particular values of 0 we are able to 
judge how far the test based on “ Student’s ” distribution is likely to be affected. 


Example 21,6 (from Welch, 19386) 

Consider the case Ui ^ = 10. Prom (21.45) we have c = 1 and from (21.44) 

9(6 + 1 )* 

y = JL-, 

0 * + 1 

Suppose now we were to use the test of 21.26, bsised on the assumption that 6 = 1. We 
should find, to a probability level of 0-06, that | u | must exceed 2*101 to be significant. 
If we judge u significant for such values how far are we in error when 6 is not unity ? That 
is to say, what are the true probabilities that 

P {\u\> 2*101} 

for varying values of 6, as compared with our value of 0*05 ? 

For a specified 6 the probabilities can eaeily be obtained from the approximate dis- 
tribution u^(gv) of equation (21.43). They are shown graphically in Fig. 21.1. The full 
line (a) shows P for various values of 6 and = n, = 10. The full line (6) shows similarly 
the values for = 5, n, = 16. (The dotted line (c) we refer to below.) 

A.S. — ^VOL. n. 


I 
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In 08496 (a) the line 

does not deviate very 0-3 • ^ 

muoh from the horizontal 
at P = 0-06, and we may 

conclude that the test / 

based on the assumption / 

of equal variance is not 0-2 - / 

very much in error. In / 

any case, if the curve Values j 

falls below the line P = p j 

0*05 we are on the safe / 

/ io 

side, for our true proba- * o*/ - / 

bility is then less than 

0-05, and in rejecting the 

hypothesis at that level 005 = ^ “ "" 

we are adopting more 

stringent standards than 

is apparent. ^ o ol 0 /0 / O /O /OO 

In case (6), when the Values of 6 (loganthmic scale). 

sample numbers are un- Fio. 21 . 1 . 

equal we have a different 

state of affairs. For fl < 1 the test is very conservative, but for 0 > 1 it may err very 
seriously in the wrong direction. 


21.31. Welch concludes that for samples of equal size there is not a serious likeli- 
hood of error in testing the difference of means as if the parent variances were equal. For 
samples of unequal size the error may invalidate the f-test and an alternative criterion is 
proposed. Write 

Xi - 


r ^ I ]*. 

\ni (n^ - 1) (n^ - 1)J 


Here, it will be observed, the denominator is an estimate of ( ~ + 


. (21.46) 


the standard 


deviation of the difference Xi — Xa. Precisely as for u we approximate to the distribution 
of this denominator by a Type III form. Corresponding to (21.39) we find 


(»i - 1 )/ \ni nj\ ^21 47) 

6 = -^ + 

Corresponding to (21.46) we find c = 1, and to (21.44) 

' “ +-«(».*- 1))- ■ 

V is then distributed approximately in “ Student’s ” form with v degrees of freedom. The 
dotted line (c) in Fig. 21.1 shows the relationship between 0 and P { 1 1 ; | > 2*101} for 
ni = 5, na = 15. Clearly the error is now much smaller than when we used u for the same 
sample numbers. 
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Difference of Two Variances in Normal Samples 

21.32. If we have samples of n^ and n^ members from normal populations with 

^2 

variances a\ and <r|, the ratio of sample variances p* = is distributed in the form (of. 

^2 

Example 10.18, vol. I, p. 249) — 

pn, -2 


dF oc 


The related quantity 
is distributed in Fisher’s form 






i |n;Vn,~2)’ 


— Hog "'4^-4! »>■ 


dF oc 


ii - 1)^ 

e'** dz 






. (21.49) 

\ 

- I (81.50) 
. (21.61) 


where Vi = ni — I, — I- The v’s may, by a convenient extension of our previous 

terminology, be called the degrees of freedom associated with z. In practice, z is generally 
used in preference to p, but tables of both are available. 

These distributions provide a test of significance of the equality of the ratio uf/cyl- 
On the hypothesis of equality they are independent of the ratio and the probability of 
an observed p or z can be obtained. As usual, if this is small we reject the hypothesis. 
We leave it to the reader to show that this typo of inference can be based on the theory 
of confidence intervals or the theory of fiducial intervals in the usual way. 


Example 21.7 

In Example 21,4 we had two samples of children and found that the difference in 
means was not significant. This was on the hypothesis that the variances were identical, 
and since the two samples are equal in number the inference remains valid even if the 
variances are different, as illustrated in 21.31. We will now test directly whether the 
sample variances themselves indicate any significant difference in parent variances. 

We have 

2’ {x^ - x^)^ = 9-40 = 9 

E {X2 “■ Xi)^ == 3-90 Vg = 9. 

Hence 

1 , 9-40 / .3-90 „ 

2 = J log, -g- / = 0-4398. 

From Appendix Tables 4 and 5 of Vol. I (pp. 442-3) we see that for v* = 9 the 5-per-cent 
points of z are 

== 8, 0-5862 

v^ = 12, 0-6613 

and the 1-per-cent, points are 

= 8, 0-8494 

== 12, 0-8167. 

Thus, notvdthstanding that one variance is about 2^ times the other, the probability that 
the observed z will be exceeded on random sampling from populations with the same 
variance is greater than 0-06, and the difference of sample variances is not signific 9 .nt. 
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There is a point here which is frequently overlooked. In carrying out the s-test we 
alwa3rs take the ratio of the larger variance to the smaller, so that our probability levels 
relate, not to the chance that a given pair of variances have a larger ratio than the observed 
one, but to the chance that the bigger of the two exceeds the smidler in a oertain ratio. 
A probability of 0*05 thus relates to the chance that dlher sf/s| exceeds a given amount 
k, or sf/«| falls short of a given amount \/k. If we are interested only in the former 
contingency our probabilities should be halv^. 


Properties of Fisher's DistrUmUon 

21.33. The s-distribution plays a very important part in statistical inference based 
on small samples, and we digress at this point to give an account of its main features. 

The distribution function of z may be obtained from the incomplete J8-function, for 
z may be easily transformed into a T 3 rpe I variate. There are, however, special tables 
for lower values of and Vt and satisfactory approximations of various kinds for higher 
values. 

The characteristic function of z is proportional to 

r 

J -« (vi e** + V,)* 

where $ = it, and is thus 


4(t) 


[J 


Thus, taking logarithms and using the expansion 
we find 


log r (1 + iK) = i log 271 + (a? + 4) log x — X + — 


2\vi V,/ 4 Vt/ 

Thus, for large vi and Vt, z is distributed normally with mean 
-if- — ^ ^ and variance 4 f ~ ~ V 

Vt/ “ Vvi Vt/ 


. (2L62) 


(21.53) 


21.34. Various approximations have been given for the case when Vi and Vt are 
not large enough to justify the assumption of normality. 

(a) (Cornish and Fisher, 1937). The method is that of 6.32 and depends on the 
expansion of the distribution in a Gram-Charlier series. From the successive derivatives 
of log JT (1 + x) we can find those of <{> {t)y and hence ascertain the cumulants of z. Writing 

ri = ~ and r, = — , we find 

Vt Vt 

K. = - J (rj - r,) - i (r? - r|) 

'f. = i (fi + r») + i (r| + r|) + 4 (»i + rf) 

K. = - i (rf - r|) - (rf - r|) 

K 4 =“ + 3 (f} + rj|) 

= - 3 (r* - rS) 

K. = 12 (rf + fi) 


. (21.54) 
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Hence, putting cr = ri + »"* and d = — r», we find for the I’s of 6.32 (m = 0, 

variuioe = Jo) — 

I 


x = - + 

^ = + + i {<='* + 3 <5*). 


and so on. After some reduction we find, for the value of z corresponding to a probability 
a (which in turn corresponds to a normal deviate f), — 


- « Vi - *« + "> + Vi { i k V If ■ + • «) } 

[i[^ 

V 2 \ 19J 


- So If* + *f ’ I »> + 5^ '«• + ’«• + ■*' + 


(f‘ + 20f» + 15f) 


2880 156520{r- 


- 15l3f) I 


. (21.66) 


(6) (Fisher, extended by Cochran, 1940a). Writing n indifferently for and rj, we 
have, from (21.55), to order n *- 


z ~ 


~ fVi - if If' I- 

Put h = 2/or. Then 


1 «) I "« 


}■ 


* - 4 - *■> «• ■ - kk ■' »4 ■ i^'-™' 


Now 

Hence, if we put 


_J._ ...f +0 1».). 

a/(A - 4) VA SAVA 


»“V(A'-A,'-»*lf‘ + ^'’ 


( 21 . 57 ) 


the difference of this quantity from (21.56) is 


(f» f- \U)6Wh 

144 


i_ 3 

provided that we take A = — - 

D 

The difference is small in virtue of the large denominator and the factor 

which is small if Vx and v, are not too different. Thus we may take z as approximately 
given by (21.57). The values of A for various values of the significance level are 

Level 40% 30% 20% 10% 5% 1% 0-1% 

A 0-61 0*55 0-62 0-77 0-96 1-40 209 
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For the commoner levels of significance the form taken by (21.67) is 

20 per cent, level : — 0*46146 

^/{h — A) 

6 per cent, level : — 0*78436 

^ ^/(h — X) 

1 per cent, level : ■— 1*2366. 

^ V(A - A) 

0*1 per cent, level : - ~ 1*9266. 

^ V(A - A) 


( 21 . 68 ) 

(21.69) 

(21.60) 

(21.61) 


The accuracy of the approximation for Vi = 24, = 60 may be judged firom the following 

comparison .* — 


Level 

Value of z from 

Exact Value. 

per cent. 

(21.67). 


20 

01337 

0*1338 

1 

0*3748 

0*3746 

01 

0*4966 

0*4955 

! 


(c) (Paulson, 1942). The Wilson-Hilferty approximation to of 12.7 indicates that 

( y*V 2 2 «? 

^ j is distributed normally about mean 1 ^ wiGi variance The ratio ^ itself 

is the ratio of two independent quantities distributed as with and Vt degrees of free- 
dom. Further, in virtue of Geary’s theorem (Vol. I, p. 253) the ratio ^ 

normally distributed in standard measure. 

We may thus regard 


(21.62) 


as approximately normally distributed in standard measure. The approximation seems 
remarkably good. For instance, the following shows the exact and approximate values 
of p* for Vi = 6, V, =! 12. 


Level 
per cent. 

— p*, from 

(21.62). 

Exact Value. 

1 20 

1*72 

1-72 

! 5 

3*00 

300 

1 

4*85 . i 

. 4:82 

0*1 

i 

8*58 

8*38 
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The, Problem of k Samples 

21 . 35 . We now proceed to consider the ca43e when we have samples from k different 
populations and wish to determine whether there is any evidence of significant differences 
between those populations. In some cases the appropriate test can be carried out by the 
;if*-distribution, particularly if the data are grouped. For the groups may then be regarded 
as determining the rows of a contingency table and the different samples the columns, and 
a homogeneity test applied to the table in the manner of Chapter 12. Again, we may 
compare the samples pair by pair by the foregoing methods ; but this, apart from being 
tedious, does not give us what we want, namely a test of homogeneity of the set of samples 
taken together. 

21 . 36 . Consider in the first instance the sampling of attributes. Suppose we have 

samples from populations in which the true proportions of successes are w, the observed 
proportions being Pi • • • Pk sample numbers totaUing n. 

If p is the mean proportion of successes in all samples taken together, and our hypothesis 
is that the populations have a common value, p will be an estimate of m and we have for 
the variance of pj — 


var = 


nj 


where 



approximately. 


p == 


1 

n 


SnfPy 


(21.63) 


It follows that 



will be distributed normally about zero mean with unit 


variance, and hence 


pq 


. (21.64) 


in the Type III form with ifc — 1 degrees of freedom (not k because we have lost a degree 
by estimating p). Hence the ratio 

g* = (21.66) 

pq(k-\) 

has expectation unity. The quantity Q is called the Lexis ratio, after the author who 
first discussed it in detail (Lexis, 1903).* 


* Lexis first developed the use of Q in a paper “ t^ber die Theorie der Stabilitat statistischer Reihen,” 

1879, Conrad' 8 Jahrhucher, 32, 60, reproduced in the reference given above. He dealt, however, only 
with the case when all the n*s were equal and had no knowledge of the sampling distribution of Q. In 
practical applications he took as each n/ the average for the group. ** Der daduroh begangenen Fehlor 
kann man beurteilen wenmnan n einmal mit der grOssten und einmal mit der kleinsten Grundzahl 
bereohnet.** 



COMMON TESTS OF SIGNIFIGANCE 


120 

ExampU 21.8 

From 1910 to 1019 the numbers of live male and female births in Jhigland and Wales 
were as follows : — 


Year. 

Male Births. 

Female Births. 

Total Births. 

Proportion 

Male/Total. 

1910 

467,266 

439,696 

896,962 

0*5098 

1911 

448,933 

432,206 

881,138 

0-6096 

1912* 

446,004 

427,733 

872,737 

05099 

1913 

449,169 

432,731 

881,890 

0*5093 

1914 

447,184 

431,912 

879,096 

0*5087 

1916* 

415,206 

399,409 

814,614 

0-5097 

1916 

402,137 

383,383 

785,620 

0*5119 

1917 

341,361 

326,985 

668,346 

0*5108 

1918 

339,112 

323,649 

662,661 

0*5117 

1919 

366,241 

336,197 

692,438 

0*5145 

Totals 

4,101,602 

3,933,800 

8,035,402 

0*5104 


The proportion of male births showed an increase during the war years 1916-1919. 
This is a well-known effect of war, but suppose we had noticed it here for the first time. 
The natural question is : can the effect be accidental ? There is no doubt about its reality^ 
for the data cover the whole population ; but if we suppose that sex at birth is distributed 
according to the laws of chance, do the differences observed suggest that in the ten years 
concerned there was a significant change in the population (as regards proportion of male 
births) ? Let us consider the homogeneity test applied to the 10 proportions. 

We have p == 0'6104, w == 8,036,402, fc — 1 = r = 9 and the sum {p^ — p)^ will 
be found to be 19-896,783. Hence 




19-896,783 

9 X 0-6i()4 X 0-4896 


= 2-974 


_ 1) Q 2 ^ 79.018. 

Q is suflSciently far from unity to reject decisively the hypothesis that the data are homo- 
geneous. A ;f*-test will confirm the conclusion. We infer that, whatever the reason, 
the differences in proportions of male births, slight as they are, cannot be accounted for 
on the supposition that the distribution of sex is according to chance in samples from 
a constant population. We may observe that, had we obtained the same proportions 
for a sample one-tenth the size, would have been 7*962 and we should not have inferred 
non-homogeneity. 


21 .37. A similar test may be applied with k samples of variables. Let the samples be 

^129 • • . mean 

^21f ^22f • • • ®2lti 99 99 X2 


^k 2 * • • • ^knt 99 

The variance of the jth sample is 

1 
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and an estinoate of the population variance may be obtained by taking the weighted mean 
of sample vwianoes 

,2 = - XfY ( 21 . 66 ) 

71 *~ IC j I 

Here we have reduced the divisor to n — 1; so as to correspond with the number of degrees 
of freedom. 


Furthermore Xf will be distributed with variance — and hence (assuming without 
loss of generality that the parent mean is zero), 

k 

E ^ {rij (Xj — f)*} — E{E {Uj Xj) — E (n**) } 


/=■! 


= ko<‘ - a* 
= (ik - 1) <T*. 


Putting then 

4 = ( 21 . 67 ) 

we have another estimate of a^. Within sampling limits and s^^ should be equal. If 
they are not, we suspect the homogeneity of the population. 


21.38. The above test is a simple form of the analysis of variance, which we shall 
study extensively in Chapters 23 and 24 ; it is therefore unnecessary for us to develop it 
further at the present stage. Essentially the test is one of simultaneous significance of 
differences between means on the assumption that variances are constant. We shall also 
discuss in Chapter 26 a generalisation of the variance ratio for testing the homogeneity 
of a set of variances. 


Example 21,9 

The following table (from the Registrar-General’s Statistical Review of England and 
Wales for 1933, Part II) shows the numbers of males married in England in that year 
classified according to age and district. (Certain small numbers of unspecified age and 
those under 21 have been omitted.) 


District. 

21- 

25- 

Age (Years). 

30- 35- 

45- 

66- 

Totals. 

South-East . 

31,714 

43,979 

14,995 

7,986 

3,928 

3,717 

106,318 

North. 

31,607 

39,849 

13,620 

7,108 

3,362 

2,916 

98,362 

Midland . 

17,465 

21,486 

6,729 

3,340 

1,624 

1,509 

52,153 

East .... 

4,016 

6,297 

1,820 

962 

457 

386 

12,938 

South-West . 

4,323 

6,066 1 

1 

2,218 

1,177 

514 

580 

14,877 

Totals 

89,025 

116,676 

39,382 

20,672 

9,885 

9,108 

284,648 


Note the changes in interval at 25- and 35- years. 
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The question we shall consider is whether at marriage differs significantly between 
different districts. This might, for example, be an important point if we were about to 
sample the population for some quality related to age at marriage, such as the number 
of children per family. The data might be regarded as a contingency table and x* used 
as a test of independence in the usual way. Here we adopt an alternative by considering 
the mean age at marriage in the five different districts. 

Taking the centres of the intervals to be 23, 27*5, 32-5, 40, 60 and 57*5 years (the latter 
being admittedly an approximation) and making no corrections for grouping, we find : — 


District. 

Number. 

Mean 

(years). 

Sum of Squares 
of Deviations 
from Mean. 

Variance. 

South-East 

106,318 
98,362 • 

29-681,700 

29-312,626 

7,092,490 

66-710 

North 

6,092,375 

61*938 

Midland 

52,163- 

29007,344 

3,105,520 

59*546 

East 

12,938 

29-425,761 

807,911 

62*445 

South-West 

14,877 

29-873,731 

1,025,284 

68-917 

Whole population ... 

284,648 

29-429,049 

18,143,921 

1 

63-741 


The total of the sum of squares about district m^ans, Z {x^ — Xf)\ is the sum of the 
figures in the fourth column, namely 18,123,580. The sum of squares Zn^ix^ — x)* is 
found to be 20,341. We have the useful check that these two together are equal to the 
sum of squares of deviations from the population mean, 18,143,921 (a property which we 
shall often require in the analysis of variance). 

Thus 


= 


18,123,580 
284,648 
20,341 


= 63*67 


= 5086*26. 


/ 


No test of significance is required to see that the difference in mean age at marriage between 
dishriots is not a chance effect. 


Testa of Bandom Order 

21.39. The tests described above are concerned with the values of a number of 
sample members but not with the order in which these values occur. Sometimes there 
may not be an order, as, for instance, if a number of plants are grown simultaneously or 
a number of names drawn from a hat in a single handful. More frequently there is a tem- 
poral order of appearance in the values, and it is clear that, on some occauons at least, 
the order may be material. To take an extreme case, suppose we are told that in a sample 
of 100 births 63 are male. We conclude that the sample is concordant with the hypothesis 
that male and female births occur at random with probability But if we knew in addition 
that the first 63 births were male mid the next 47 female we should almost certainly reject 
the hypothesis. 

21 .40. If sampling is conducted by taking members one at a time from a population 
and the process is random, then any wder is as probable as any other order. The sample 
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may be considered as a section of an infinite series generated by the sampling process, and 
this series ought to behave like von Mises’ Irregular Kollektiv (7.15). It is a happy 
hunting-ground for the theorist, since there is no limit to the number of tests which can 
be invented to ascertain whether a given finite series conforms to the random scheme. We 
have considered a few such tests in connection with random sampling numbers (8.15) 
and shall discuss others in connection with time-series (Chapter 30). Here we discuss a 
few tests which are useful in detecting departures from randomness in the sampling. We 
are not now considering hypotheses as to the parent population, but since the randomness 
of the sampling is an essential element of inferences in probability it is convenient to 
consider the reliability of the sampling, together with inferences from the sample about 
the parent. 

Banking Tests 

21.41. Suppose we have a sample of n members Xi . . . x^, in that order, and are 
doubtful about its randomness. Such doubts may arise owing either to defects in the 
sampling or to possible alterations in the population while the sampling is going on. In 
the first case the process itself is at fault ; in the second, circumstances are at work to make 
the sample something other than it purports to be, a random sample from a single popula- 
tion. Either influence may relate the magnitude of the x'a to the order in which they 
occur, and the values Zi ... are not then a random order in the sense that any other 
order was equally probable. 

Let us then consider all the possible orders, n ! in number, of the observed values 
Xi ... x„. A proportion of these, determined by a significance level of 6 per cent, or 
1 per cent., say, we will decide to reject as improbable ; and we will select as the “ improb- 
able ” rankings those which exhibit the systematic appearance of which we are afraid, 
and particularly the regular rise or fall from to in magnitude. In short, we rank the 
sample in order of magnitude, say Xi . . . X„, where the X’s are a permutation of 
the first n integers, and compute a rank correlation coefficient between this order and the 
order !...«. If the coefficient is large in absolute value (“ large ” being determined 
by the significance level) we suspect the sample of being subject to systematic influences. 

Example 21.10 

Thirty persons in the income group £1000-£1600 are asked to supply returns of their 
annual income for some purpose connected with taxation. It is intended to summarise 
their replies by a given date, but when that date arrives only 20 answers have been received. 
This is a frequent event in postal inquiries, even when the return is compulsory, and it 
has to be decided whether the 20 returns may be accepted as representative of the 30. 
There are prior reasons for suspecting that persons with bigger incomes may delay more 
than the others, partly because of difficulty in completing returns and partly because of 
a natural reluctance to part with information which may tell against them.*. We there- 
fore wish to ascertain from the 20 returns whether there is any evidence that persons with 
smaller incomes tend to submit returns earlier than those with larger incomes. 

Suppose the 20 returns give incomes, in that order, of £ per annum : 1180, 1270, 1400, 

* This is an assumption for the purposes of the example and not intended as a statement about 
taxation returns in real life. 
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1090, 1190, 1260, 1170, 1300, 1290, 1310, 1280, 1360, 1320, 1380, 1420, 1390, 1470, 1360, 
1220, 1460. The ranldng order is — 

No. of saanplo . 1 2 3 4 6 6 7 8 9 10 11 12 18 14 16 16 17 18 19 20 

Kaak 3 7 17 1 4 6 2 10 9 11 8 13 12 16 18 16 20 14 6 19 

DifFermce - 2- 6- 14 3106-20-1 3-1 1-1-3 0-3 4 14 1 

The sum of squares of differences is 608 and thus the Spearman coefficimt of rank 
correlation betwemi observed and natural order 1 ... n is 


6 X 608 
7980 


0-618. 


The probability of obtaining such a value or greater (16.18) may be found from “ Student’s ” 
distribution by putting 



V = 18, 


and is found from Appendix Table 3, vol. I, to be about 0-004. The test confirms our 
suspicion that size of income is correlated with order of appearance, and if we intend to 
use the mean income of the 20 returns as an estimate of the income in the full 30 we must 
recognise that it may very well be an under-estimate. 


21.42. It will be noted in this example that we have made no assumption about 
the distribution of incomes in the sample or the population (the latter of which would 
certainly not be normal) and have used the sample values themselves without any reference 
to the question whether they were representative. This does not invalidate our inference, 
which is made within the population of samples obtained by permuting the observed values. 
(Cf. 17.44 and 17.45.) 

21.43. '^A second test of use in random series, particularly when it is suspected that 
cyclical effects are present, may be obtained by counting the occurrences of “ peaks ” ^ 
“ troughs ” in the series. A member is said to be a “ peak ” if it is greater than the two 
neighbouring members, and a “ trough ” if it is less than those members. In either case 
it is a “ turning-point ”. The interval between turning-points is called a “ phase ”. 

Three consecutive observations are required to define a turning-point. If the series 
is random the probability that any given three provides a turning-point is f , for the values 
^i> ^t> ^8 occur in six orders and in only four is the greatest or least value the middle 
one. In a series of N terms there are N — 2 sets of three, and hence the expected number 
of turning-points p is 

^ (p) = f (jy - 2). . . . . . (21.68) 

The variance and higher moments of p are not so easy to determine. like the ranking 
problems considered in Chapter 16 (to which the present problem is analogous), the dis- 
tributions resulting are rather complicated. We quote without proof the results 


(P) = 
i«» (P) = 
Pa (p) = 


16N - 29 
90 

16 (N 4- 1) 

946 

448N* - 19762y + 2301 


. (21.69) 
. (21.70) 


4726 


. (21.71) 
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As N tends to infinity the distribution tends to normality fairly rapidly, and p may, 
for finite N, be taken as normally distributed about mean f (JV — 2) with variance 
16^- 29 I 

- 90 • J 

21.44. A further test may be derived from the distribution of phase lengths. The 

2 

probability of a phase of length d in a series of d -f- 1 terms is clearly , - -- ,, for only 

(d “t* 1) J 

two of the possible permutations are favourable. In a series of length N there are 
N — d — 2 possible phases of length d, for d + 3 points are required to determine the 
phase. The probability of a phase d in d + 3 terms is 




1) ! (rf + 2) ! 


1 1 
{d + 2) ! (rf + 3) ! 


d* "I" 3c? -f“ 1 
{d~+ 3)1~ 


(21.72) 


(21.74) 


and hence the number of phases of length d is 

, 2 (JV - d - 2) (d* + .3d + 1) 

■ (d + 3)! " ' ' 

Now the number of possible phases is 

"''{'^■'-+#■1} '*■•’** 

for there is one fewer phase than turning-points, f — 2) in number, and the whole 
series may be a phase, which accounts for the factor 2/N ! In practice this is negligible, 
and for the probability of a phase d in a series of N we then have (21.73) divided by (21.74), 
namely 

6 (d^ + 3d + 1) (i\r - d - 2) 

(d + 3) ! (2iV^-~'7)" ^ ‘ ^ 

The moments of this distribution are easily obtained to a very close approximation. 
For example, 

' 6 , (N - d - 2) (d* + 3d + 1) 




(d + 3) ! 


2^ [{N - 2) { (d + 3) (d + 2) (d + 1) - 3 (d + 3) (d +- 2) + 6 (d + 3) - 3} 
1 

- (d + 3) (d + 2) (d + 1) d + 3 (d + 3) (d + 2) (d + 1) - 8(d + 3) (d + 2) 

+ 13 (d + 3) - 9]/(d + 3) ! 

„ r r 1 3 . 6 3 1 


2N -1 


S\(N- 2) 


\d ! 


(d + 1) ! (d + 2) ! (d + 




1 8 _,13_ 9 

(d - 1) ! d ! (d + 1) ! (d + 2) ! (d + 


-1 

H3)!j 


Remembering the rapid convergence of — j to e, we may write this 

[(N - 2) {e - 1 - 3 (c - 2) + 5(c - I) - 3 (e - I) } 


- c + 3 (c - 1) - ,8 (c - 2) + 13 (e - I) - 9 (e - I) ]. 
, , 3 (N + 7 - 4e) 3 


K W = 


2N-1 ~2' 


. (21.76) 
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Similarly we find 

A*a (d) = - 21) N» + (4e - 17) N - (48e* - 140e + 14) } ^ 0-660. (21.77) 

i 

^ 21 . 45 . In -comparing observed distributions of phases with expected values the 

ordinary ;^^-test cannot be applied, because the probabilities of the events in a finite series 
are not independent. A test of significance has been derived by Wallis and Moore (1941), 
who consider a grouping into three categories, d = 1, d == 2 and d > 3. They conclude 
that calculated from these three groups can be tested in the usual Type III form 
with V == if x^ > O'S. For lower values -^x^ can be tested in that form with v == 2, 
This test is independent of the law of distribution of the variables and is thus of general 
application. It has to be remembered, however, that generality in these matters may 
be offset by loss of sensitivity, and more searching tests may be required in certain cases. 


Example 21,11 

The foUowing table shows the deviations from a moving nine-year average of potato 
yields in England and Wales for the years 1888-1935 (units are -^^th ton) : — 


Year. 

Yield. 

Year. 

Yield. 

Year. 

Yield. 

Year. 

Yield.^ 

1888 

- 6 

1900 

- 7 r 

1912 

- 15 P 

1924 


1 P 

89 

f 2 P 

01 

-f 6 P 

13 

+ 3 P 

25 

, 1 . 

2 P 

90 

- 4 T 

02 

- 3 

14 

+ 2 

26 


9 P 

91 

- 3 

03 

-IT 

15 

-f 1 

27 


3 

92 

- 1 

04 

-H 2 P 

16 

- 2 P 

28 

-f 

9P 

93 

+ 6P 

05 

0 T 

17 

i f 5 P 

29 

-t- 

5 

94 

- 2 T 

06 

+ 1 P 

18 

i 4- 4 

30 

4- 

1 

95 

4- 7 P 

07 

•-IT 

19 

I ~ 4 P 

31 

— 

10 P 

96 

+ 3 

08 

4- 8 P 

20 

! - 3 P 

32 

'4- 

1 

97 

- 6 T 

09 1 

-h 4 

21 

; - 9 p 

33 

4- 

2 

98 

+ 2 P 

10 

+ 3 P 

22 

4- 11 p 

34 

■4 

5 P 

99 

0 

i 

11 

+ 4P 

23 

- ■■} 

35 


4 


We have marked with P and T the peaks and troughs of the series. The observed 
number of turning-points ig 31 in a series of 48 terms. The expected number is, from 
(21.68), I (48 — 2) = 30-67, almost exactly the number observed. No test of significance 
is required. 

The duration of phases is : — 


2 

3 and over . 


Observed 

20 

6 

4 


Predicted (21.75) 
18-76 
8-07 
3-18 


30 30 00 

Here, again, a test is hardly necessary. We find, in fact, — 0’826, f of which for 
V = 2 is not significant. 

We conclude that these tests provide no evidence against the randomness of the series 
and hence do not suggest any cyclical movement in the yields. 
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21.46. In the foregoing example we have treated the two values in 1923 and 1924 
as a single value since they are equal. These so-called ties ” frequently occur in ranking 
work and are a great nuisance. In the present case there is only one, and any reasonable 
method of treating it will not afiFect the test. Where ties ” are numerous enough to 
make a serious difference some systematic method of treating them is desirable, particularly 
if more than two individuals are tied. They may be treated as a single observation, as 
in this case (although it would probably be better then to reduce N accordingly) ; or, 
preferably, they may be counted as a mean value, e.g. with a tied pair we should consider 
the first as greater than the second and then the second greater than the first, counting the 
number of turning-points or phases as one-half in each case and adding the two together. 
This, as in all similar ranking problems, makes the theoretical discussion of sampling very 
complicated, and if it is desired to make a precise use of significance tests a further possi- 
bility is to assume that the tied members are ranked in the order most unfavourable to 
the hypothesis under test, so as to be on the safe side. 

ConditioTUil Tests 

21.47. When several unknown parameters are concerned, it may be difficult to find 
a sampling distribution dependent only on one of them which will form a basis for estimation 
or a test of significance. Sometimes, however, we can get rid of undesirable parameters 
by restricting the distribution in some way, and particularly by considering a distribution 
of samples which have some specified quality in common with the observed sample. Such 
distributions we shall, in Bartlett’s phrase, call conditional. Fisher expresses a similar 
idea by speaking of samples which have the same configuration. 

The most important application of this principle is in the testing of regression 
coefficients, which we shall consider in the next chapter. Here we give a simple illustration 
of the method for the Poisson distribution. 


Example 21.12 

Suppose we have two samples from populations which are known to give the Poisson 
type of distribution but may have different parameters. We wish to determine whether 
the populations could be identical. 

Suppose the frequencies of successes in the two samples are and If A is the para- 
meter of the parent (assumed the same for each), the probabilities of the samples are 


-A ^ 

e * — - and 

ri ! 



and their joint probability is accordingly 


P{r^, r,|A} 


e“2A 

r ■ 


( 21 . 78 ) 


This depends on A and does not help us in answering the question. However, for the 
probability of a sample with -f- successes we have (since the sum of two Poisson variates 
with parameters Ai, A, is distributed in the same form with parameter Ai + Aa) : — 


P{rx + r,|A} 


(r» + rOT’ 
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and hence 

yi|A } ^ (ft -f r,) 1 ^ r\ 

i* {»’i + »’i I A} 2^+'’* r, ! r, ! 2^ r, ! r» ! ‘ ' ‘ ‘ ' 

where r == + r,. 

Now in accordance with Bayes’ theorem we have 

»'»|A} = P{ri, r, |»-i 4-»’i}P{»’i + r, | A} 

and hence 

( 21 - 80 ) 

Consequently, «/ we confine our attention to eamples for which the total number of aueceaaea 
ia r, the probability of the observed and r* is independent of A and is, in fact, the corre- 
sponding term in the binomial + i)’’- The probability is clearly that of a partition of 
r into the observed ft and r,, and if it is small we suspect the hypothesis that the samples 
emanated from the same population. 

This kind of conditional inference raises the same sort of point as we noticed in 17.44. 
We decide beforehand that, whatever r turns out to be, we will make the infermce in the 
population of samples which yield that value of r. 


Pitman' a Teata 

21.48. In the extreme conditional case we may consider an inference in a population 
of samples the members of which are the same as those actually observed, the population 
being given by permutations or partitions of the observed values. The tests of ranking 
and periodicity given above are oases of this kind. A similar procedure has been advocated 
by Fisher in the analysis of variance and the design of experiments, and will be considered 
in due course. We now proceed to examine tests of the same nature proposed by Pitman 
(1937d, 1938). 

Suppose we have two sets of values Ui ... u^ and Vi ... v„ with means u and S 

and the mean of the two together equal to z. Given m + n objects, there are 

ways, say N, of separating them into two sets of m and n objects, of which the given set 
is one. We call \ u — the apread of the separation. Since 

mu nC = (»»-}- n) z, 

we have also for the spread 

(m -f ») I tl - g I ^ (m -f n) I r (tt) - mz \ (21.81) 

n mn 

Take a probability 1 — a = if /N, where if is an integer. If is a particular separation, 
and the number of separations with spread not less than that of i2 is not greater than if, 
we call R diacordant. If there are if or more with a greater spread we call it concordant. 
A separation which is neither concordant nor discordant is called neutral. Jim = n the 
separations occur in pairs with equal spreads, and we then take if to be even. The 
discordant separations ace most easily picked out as those with the largest values of 
\Zur-mz\. 

If the observed separation is arrived at by chance, the probability that it is discordant 
is M/N s 1 — a when there are no neutral separations. If such exist, the probability 




PITMAN’S TESTS 129 

is less than 1 — a. Similarly the probability that a separation is concordant is 1 — a, 
or more, as the case may be. 

Two samples Ui ... and ... are said to be discordant, concordant or 
neutral according as the separations u and v are so. Having selected our significance 
points dependent on a, and hence having fixed Af , we can find for what values of the spreads 
a pair of samples is discordant or otherwise, and hence whether our observed pair is so. 
If they are discordant we reject the hypothesis that they came froiA the same population. 


Example 21.13 (Pitman, 1937a) 

Two samples have the following values : — 

0 , 11 , 12 , 20 

16, 19, 22, 24, 29. 

Are they significantly different ? 

There are 9 members altogether and hence = 126 separations into samples of 

five and four. We take a to be as near as possible to 0*95, corresponding to a 6-per-cent, 
level of significance, and hence Af = 6. We then find the gi;oups which have the largest 
values of the spread. We have z = 17, so that mz = 68, and using the form 1 27 m — 68 [ 
we find those groups of four from 

0, 11, 12, 16, 19, 20, 22, 24, 29, 

which give the maximum value to this quantity. They are — 







1 Su - 68 1 

0, 11, 12, 16 




. 

29 

0, 11, 12, 19 





26 

0, 11, 12, 20 





25 

29, 24, 22, 20 





27 

29, 24, 22, 19 





26 

29, 24, 20, 19 


, 

, 


24 


The group 0, 11, 12, 20 gives the fifth largest spread, and so with Af = 6 the observed 
separation is discordant. Our inference is that the samples come from different popula- 
tions. Only in four other cases out of 126 should we get so large a spread in samples from 
the same population. 


21.49. The extended use of the above test is barred by practical inconvenience, 
but an approximate form based on a different measure of discordance may be used. We 
now put 


mlu — z)^ 

y) = 


(21.82) 


where Is the variance of the samples taken together and is thus a constant. The function 
w is hence linear in (tl — z)^, the device of squaring, as usual, getting rid of difficulties 
associated with the use of the modulus \ u — z\. N here refers to the total sample 
m + n. 

Now, for the moments of tl ~ z we may use the results of 1 1 .26 (vol. I, p. 284), giving 
the moments of the mean in sampling from a finite population ; for z is the population 

A.S. — VOL. n. K 
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mean. Replacing n in the formulae of that section by m and putting N = m + n, 
we have — 

JE (u — S) = 0 

... N — m 

F (ii - T + N - 6w (N - m)} ^4 + 3N (N - m - 1) (m - 1) /i|] 

' ' m»(N - lj (N - 2) (N - 3) 

and hence for the first two moments of w we find 

1 


E{w) = 


N - 1 




where 


0 


N 1 


f 


3(N - 2) (N - 3) t m 


^.(^_hJ)_6Uv 

miN — m) J 1 * 


-4 ^ 


. (21.83) 
. (21.84) 

. ( 21 . 86 ) 


y, referring to the measure of kurtosis — 3. 

Mi 

For fixed N the modulus of the second factor in (21.85) will be found to have a maximum 

2 (jf 2 ) 

at ' - when m = and it takes this value again at 


N -2m 
' N 




N -2 
2N - 1’ 


givmg 


m 


N — m 


- =r J or 5 for N = 14 and wider limits for larger N, It will also be found 


jV i2i + 1) 

that for ^ > 6 the factor : — 6 is not greater in absolute value than 

m (N — m) 

2 2 ) 


1 


< 


m 


5, 


5 " N — m 

i.e. unless one sample is more than four times as big as the other. Thus for such values 
and yg not large, 0 is small, and approximately 

3 


^ N* - 1' 

Similarly, using the fact that for large m and N 

Eta- ir - 1.3.6 . . . (2r - 1) (l - 0.^'. 

we find approximately 

E (w») = 


. ( 21 . 86 ) 


3.6 


(N - 1) (N + 1) (N + 3)‘ 

The moments given by (21.83), (21.86) and (21.87) are those of the R-distribution 

1 


dF 




(1 — u>~* dw, 


. (21.87) 
m 

. ( 21 . 88 ) 
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which can therefore be used to approximate to the distribution of w. In point of fact the 
distribution seems to be remarkably close. 
w may also be written 


w = 


m n 


- 


Z {u — u)^ + Z (v — v)^ + 


mn 

m + n 


(u — v)^ 


(21.89) 


which shows that i/; < 1. 
We also have 


w 


mn ^ 
m + n 


- v)^ 


. (21.90) 


I — w Z {u — u)^ + 27 (v — v)* 
and it is instructive to observe that the function on the right is the same as that of 




of (21.32) with a few changes of notation. A transformation of (21.88) to 

n^ 2 


“ Student’s ” form will in fact show that we can test 
V z= m + n -- 2 \ for (21.88) then becomes 

dF oc , 


Vr 


IDV 


W 


in the ^-distribution with 


du 


1 + 


where 


) 

m + w — 2/ 

V wv 
1 — IV 




. (21.91) 


(21.92) 


21.50. A test of a similar kind may be evolved for the product-moment correlation. 
Suppose we have two samples ... and Vx • • • Vn calculate 


cov xy 

^/(var X var y) 


for every possible pairing of the x’s and y’s, n ! in number. As before, if we choose an 
a and hence a number M such that 1 — a = M/n ! we may determine those pairings for 
which r is greatest and reject the hypothesis that x and y are independent in such cases 
if they fall among the M greatest. Since the denominator of r is constant, this is equivalent 
to attributing significance to the values of | 27 ary — nxy | which exceed a given value 
determined by a. 

Taking = y = 0, without loss of generality we find 

E(r)=0 (21.93) 


E{r^) 


E {Zxyy 

var x var y 



. (21.94) 
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and similarly, if yi, y^ are the modified measures of skewness and kortosis for x (expressed 

k k\ . 

in terms of k-statistios, i.e. yi — yt — p ) ®nd y[ and y.j those for y, it will be found that 


E (r») = 


» — 2 
n(n — 1)* 


yt/i 


E(r*) 


*\ = 


(n - 


Thus to order we have 


3 4_ T ^^1”' ~ 

1) (» + 1) n (» + 1) (n — 1)* 


. (21.95) 
. (21.96) 


E(r) = E (r») = 0 
1 


E (r*) = 


n — 1 


E (r*) = , 

' ' (n-l)(« + l)J 

These are the first four moments of the distribution 


. (21.97) 


1 




— 1 < a: < 1. 

Thiis r may be tested in this distribution or equivalently, putting 


‘ “ vcr-T.) 


. (21.98) 


(21.99) 


in Student^s form with r = n — 2. 

In particular, if the numbers x and y reduce to rankings, we have the test already 
introduced in 21.41. Compare also the result given for the distribution of Spearman’s 
p in 16.18 (vol. I, p. 401). 


The Combination of Tests 

21.51. It sometimes happens that we have a number of tests of significance, all 
yielding various probabilities, which we wish to express as a single probability. Suppose, 
for instance, that we conduct an experiment five times and that some test, such as that 
of the mean, gives probabilities to the observed deviations of 0*2, 0*8, 0*01, 0-1, 0-03. In 
the ordinary way two of these values would be regarded as significant and the other three 
not. What conclusion are we to draw as to the five taken together ? 

Suppose we have k values of the probability, • • • Pk- The distribution of any 
particular p is rectangular, i.e. 

dF = dp 0 <p < 1. 

Hence, if x = — log p the distribution of x is 

dF = dx, 0 < a? < cx) 
and its characteristic function is 
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Hence if we write 


tc 

A = - logpj, 


the distribution of A has a characteristic function 

1 




and is therefore given by 
Putting 


(1 ~ 


dF = A^-^e-^dA. 

1 {k) 

= 2A = — 2£ log ^ — 2 log ITp 


. ( 21 . 100 ) 


. ( 21 . 101 ) 
. ( 21 . 102 ) 


we see that the distribution of is 

dF oc exp iM^)dM . . . .(21.103) 

or is distributed as with v = 2k degrees of freedom. 


Example 21.14 (K. Pearson, 19336, quoting data from E. M. Elderton, 1933). 

Pairs of boys were selected in various age-groups and one member of each pair fed 
on raw, tlie other on pasteurised milk. The differences in gain in weight are shown in 
the following table, together with the standard errors of the differences based on large- 
sample theory. 


. .. , 

. . 

. 

, 

- _ . . . 



(1) 

(2) 

i (3) 

(4) 

(5) 

(6) 

Age -group, 
(Central value 
in years). 

Number 
of Pairs. ; 

i Mean Difft^rence 

1 in Weight 

1 Gained, Raw If‘s,s 
Pasteurised. 

Standard 
Error of 
DifFcrenco. 

Probability 
of Observed 
Difference or 
Greater, 

logic Pi- 

6} 

73 

~ 0*066 

0*054 

0*8888 

T-9488 

7J 

76 

i 0*022 

0*053 

0*3409 

1-6326 

SJ 

71 

- 0*003 

0*052 

0*5239 

1-7193 

9J 

77 

+ 0*011 1 

1 0 056 

0*4207 

1-6240 

lOf 

60 

4 0*002 

j 

0*057 

1 

0*4840 

1-6849 

i 


2*5096 


The values of p}^ in column (5) are obtained by expressing the observed deviations in column 
(3) in terms of the standard error in column (4) and hence determining the probability 
from the normal integral. We have 


JIfa == -2E\og^p = - 

logi.c 


= 6-86 

V = 10 . 

The probability of a value of > 6-86 for == 10 is about 0-74, and the test as a whole 
does not support the hypothesis of a differential effect on feeding between the two kinds 
of milk. 
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Nuisance Parameters 

21 .52. From the foregoing it will have been clear that in the theories of both estima- 
tion and significance one of the main problems is to find a distribution which is independent 
of certain unknown parameters in the parent population. Parameters of this kind, neces- 
sai^^ as they are in the specification of the parent and the precise formulation of our problem, 
can be a nuisance when we are seeking to make exact statements about some other para- 
meter on which interest is focussed. For this reason they have been named nuisance 
parameters. It may be useful if at this point we summarise the methods available for 
getting rid of them. 

(а) First of all there is the process of “ Studentisation ”, whereby we can remove 
scale parameters from the sampling distribution by a suitable choice of statistic. (Cf. 

19.26.) 

( б ) Secondly, we may restrict the inference to a sub-population which is conditioned 
by having certain values in common with the observed sample. It sometimes happens 
that the distribution in this sub-population does not contain the nuisance parameters, 
whereas a distribution in the full population would do so (21.47). 

(c) In the comparison of two samples, or even the testing of a single sample involving 

an unknown mean, that parameter may be eliminated by differencing (21 .27). As regards 
the case of the single sample, it is clear that Xi . . . are independent and n is even, 
the values *1 — — * 4 , . . . — a:„ will also be independent and be distributed 

with zero mean (though of course there are only of them). 

(d) Transformations of the variate may sometimes either eliminate the nuisance 
parameter altogether or reduce its importance. The most noteworthy case is Fisher’s 
transformation of the correlation coefficient (14.18, vol. I, p. 346). The transformed 
function z — C is distributed nearly normally with variance 1 /{n — 3), so that the difference 
of two correlations when transformed does not involve the common value of C. 
(Cf. Example 14.8.) 

(e) We may find distributions which are independent of the unknown parameters, 
and even of the population, by using the methods of ranking or considering partitions 

(21.41, 21.48). 

(/) The fiducial argument, in at least one known case, gives a test index>endent of 
unknown parameters, namely the Behrens test (20.13). 

It must be realised, however, that all these types of inference do not stand on equal 
footings. In particular (e) requires further examination, as we proceed to show. 

21.53. We may now review the many different tests which have been described in 
this chapter and consider more closely the type of reasoning on which they are based. 
We may group our tests broadly into two classes, those which give a direct test of a given 
value of a parent parameter and those which do not. 

The first class rests on a type of inference which we have discussed fully in connection 
with the problem of estimation. There is, in fact, only a difference in viewpoint, and little 
or none in essential ideas, between estimating a parameter by assigning a range to accept- 
able values (whether by confidence intervals or fiducial intervals) and ascertaining whether 
some prior value lies in that range. The significance of parameters in large samples, the 
test of the mean in normal samples by ” Student’s ” distribution, the test of a correlation 
coefficient in normal samples, and others of the same kind relating to a specified parameter 
have the same logical foundation as the theory of confidence intervals or the theory of 
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fiducial intervals, whichever is preferred. They aU provide for the coimderation of aUemative 
values of the parameter. 

21.54. The second group of tests are not, on the face of it, concerned with the value 
of a parameter in a parent population, and some of them take no account of possible alter- 
native hypotheses. Consider, for example, a test of normality or a test of randomness. 
The hypothesis is that the population is normal or the sampling is random, as the case 
may be, but this does not specify a parameter. What alternatives to normality or to 
randonmess are we considering, if any 1 We must have the existence of such alternatives 
in mind, however vaguely, for otherwise we should not be testing these particular 
hypotheses. But can we say what they are ? And if not, do our inferencearemain valid ? 
When working with a probability a shall we still be right in a proportion a of the oases in 
the long run ? 

21.55. The kind of argument we have used in all these oases is this : on the given 
hypothesis the observed sample and all samples providirtg a greater value of the statistic 
being used for the test have a small probability. Therefore we reject the hypothesis. 

We may note at once that in rejecting the hypothesis we do so in favour of another 
hypothesis for which the observations are more probable. We may not express this thought 
explicitly, but it is there. The various statistics we use for testing normality, for instance 
6x, can arise with greater probability from other populations which are skew or have a 
marked deviation from mesokurtosis ; the fact is assumed as self-evident (as indeed it 
is) and hence, if the statistic is improbable for the normal case there will be non-normal 
cases of greater probability. We remark, nevertheless, that the actual probability a is 
calculated on the normal hypothesis and does not hold for the non-normal cases. Thus 
we can no longer assert that we are right in proportion oc of the cases. We are therefore 
relying on a less definite principle of inference to the effect that we reject a hypothesis 
which gives an improbable value to observation, provided that there exists some other 
hypothesis which gives a more probable value. 

21.56. A similar argument applies to tests of randomness. It is obvious that many 
other methods of generating a series exist which give a greater probability to a systematic 
series than the random method, and in rejecting the latter we do so more or less consciously 
in favour of the former. Our intuitive feelings on the point lead us to apply one test when 
we have the possibility of systematic order in mind (the ranking test) and another when 
we are interested in oscillations (the phase test). What we are doing, in effect, is selecting 
the test of randomness which we feel to discriminate best between the hypothesis of 
randomness and the alternative possibilities. 

21.57. Although, therefore, much remains to be done in putting tests of normality, 
randomness and goodness of fit on a formal logical basis, there do not appear to be any 
serious difficulties in doing so insofar as the specification of alternative hypotheses is con- 
cerned. But there remains the difficulty hinted at at the beginning of 21.55. In the 
majority of cases we have a probability 1 — a that the observed statistic to will be exceeded, 
and if this is small reject the hypothesis. But why exceeded ? Why reject the hypothesis 
because of the improbability of a number of events which have not happened \ 

Here also it seems that a closer inquiry into the logic of the process would be worth 
while. We have seen how it can be justified by confidence-interval or fiducial theory 
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when a parameter is imder consideration. When no parameter is specified, the process 
must, in the present state of our knowledge, rest on more intuitive ideas. My own view 
is that, in a vague kind of way, we are really considering the range of values of a parameter 
without realising it. In selecting a statistic to carry out the test, we usually relate it to 
the sort of effect we are expecting to divert the real state of affairs from those of 
our hypothesis. For instance, if we suspect cyclical effects in a random series we base 
a test on oscillations in that series. The further the series deviates from randomness the 
greater will he the value of our statistic ; and consequently, if we could measure deviation 
firom randomness (in the direction of cyclicality), we should have a parameter which could 
he located in a range in the manner of confidence intervals. Such a range would exclude 
the larger values of our statistic if it can be regarded in any sense as estimating the para- 
meter (or, more generally, as increasing with it) ; and hence the procedure of rejecting the 
hypothesis if the statistic is among these large values may be justified. 

21 .58. It is for this reason that we began the chapter by defining tests of significance 
in relation to a parameter-value given a priori. It seems probable that in the ultimate 
analysis no other definition will be satisfactory. The fact that in this chapter we have 
given tests of hypotheses which do not appear to specify a parameter value is, I think, 
merely a refiection of the fact that the nature of those hypotheses and the inferences about 
them are not usually understood clearly but are based on more or less intuitive ideas. It 
is probable that many of these ideas are sound and can be given explicit logical foundation ; 
but the matter awaits investigation by the statistical logician. 

21.59. There remains for consideration the t 3 rpe of inference used in Pitman’s tests 
(21.48 and 21.49). These are of the character of tests of randomness. Given a set of 
values, we consider all the arrangements in which they could have happened and reject 
the hypothesis if the observed arrangement is improbable. Here again, as it seems to me, 
there is a suppressed series of alternative hypotheses which would make the observed 
value more probable ; and in choosing the test, such as the “ spread ” or the high value 
of a correlation, we are intuitively relating the magnitude of a statistic to the deviation 
from randonmess. Pitman himself has shown, however, that when the hypothesis is 
definite and specifies the difference of two means, the tests give confidence intervals in the 
ordinary way (cf. Exercise 21.16.) 

We shall resume the general theory of tests of significance in Chapter 26. 

NOTES AND REFERENCES 

For the use of the {-distribution in non-normal oases see Geary (19366) and Bartlett 
(1935a), the latter of whom shows that, for moderate samples, departures from meso- 
ku^sis are not very serious. For approximations to { in the normal case see Hendricks 
(1936) and Hotelling and Frankel (1938). For approximations to the s-distribution see 
Cochran (1940a), Cornish and Fisher (1937), and Paulson (1942). See also references to 
Chapter 23. 

For the further theory of the ;{‘-test see Neyman and Pearson (1928, 1931a) and for 
another test of goodness of fit Neyman (1937a). The theory of 21.44 has been studied 
by a number of writers, notably by Andr4 (1884), Kermack and MoKendrick (1936, 1937), 
and Wallis and Moore (1941). 

■ The amalgamation of tests given in 21.51 was apparently first given by Fisher in an 
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early edition of Statisticdl Methods for Research Workers and was studied in detail by 
K. Pearson (19336) under the title of the P^-test, and by E. S. Pearson (1938). 

For a test of significance of the difference of two variances in samples from a bivariate 
normal population see Hirschfeld (1937), Finney (1938), Pitman (1939e), Morgan (1939), 
and De Lury (1938) ; and see Exercise 21.3. 

For the tests by Pitman, see his papers of 1937a, 1938. The similar problem in the 
testing of homogeneity in the analysis of variance has also been studied — see references 
to Chapters 23 and 24. 

For the test of difference of means when variances are unequal from the point of view 
of confidence intervals see Welch (19386) and the appendix to tUs paper by Miss Tanbum. 


EXERCISES 


21.1. For the population represented approximately by 

show that, if is negligible, the joint probability of a sample Xi . , . differs from that 
if Ki is zero by a term 

— — ? ( ^ ^ I exp (- I i: xf) dx^ . . . dx^. 


By the transformation 


yx == (*i - »*) 

Vi = A:. (*1 + ^2 — 2 *,) 

yo 

Vn ~ “t" *2 • • • “t" ^«) 


and the further transformation 


yx = p sin .3 sin ... sin <j>i sin 

y, = /) sin ^„_3 sin ... sin eos 

y, = p sin ^„_3 sin ... eos 

yn-\ = P cos ^„_3, 

show that the corrective term to the distribution of “ Student’s ” t is 


dt 


r (r.’’"’ - 1”"’) {-¥(* 


and hence obtain equation (21.11). 


(Geary, 19366.) 


21.2. By the polar transformation of the type of the previous exercise applied to 
all n variates show that if a random sample is drawn from a normal population with zero 
mean the frequency element may be written as 

_i — 6“*'’* dp d^„ sin ^x d<fti sin* xf>t d^t • • • sin"”* 
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S \ x\ 

Henoe if w = — = — where «* is the sample variance, the distribution of w is independent 
na 

/2 

of that of 8. Hence show that for the distribution of w, vriting ® == w 


IH 


+ 2 " 

’ /’{» + !) -y/n 


fit = — + a* } 


+ 3»(*> + 

lit =— (3n<i> + (8o* + 3) + 6a* »<*> + a* »W} 

n* ^ I n 

Hence show that for n = 50, — — 0*24 and /5, = 3- 10, indicating fairly rapid tendency 

to normality. 

(Geaiy, 1935a). 


21.3. Show that in samples firom a normal bivariate population 


dF oc exp 




1 fa:®. 2pxy 

2ir-7*) \5| “ 


+ 


9 ] 


dx dy, 


the functions % = —+—, Vj ^ 

Ox <Ti Ox o* 


are distributed independently and that their correlation coefficient B may be written 


where 


R = - 


a — a 
■v/{(o + a)* — 


4aatr*}’ 


a 


^ _ 27 (a: — »)* 

~ ^{y - y)*’ 


and r is the correlation between the observed a;’s and y’s. Hence show that 

_ BV(n — 2) _ (a — a) V(« — 2) 

“ V(1 ~ V'{4 (r-r*Toai 

is distributed as “ Student’s ” t with n — 2 degrees of freedom. Show how to test the 
ratio X from this result. 


(Pitman, 1939c. The test has the remarkable property of being independent of the 
parent correlation p.) 


21.4. If an even number n of members of a sample come firom a population with 
mean ft, show how to find a sample of half the size distributed with twice the variance 
about zero mean. Hence show how to extend the result of Exercise 21.2 to the case where 
the population mean is not zero. 


21.5. If a parameter admits of a sufficient estimator, show that a test of its significance 
can be derived direct fi'om the likelihood function. 


21.6. Derive equations (21.47) and (21.48)> 
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21.7. Let III, hz • . • h^n-i — 1) linear functions of the observations which 

are orthogonal to one another and to Xi, and let them have zero mean and variance af. 
Similarly define ^_ 2 - 

Then, in two samples of n from normal populations with equal means and variances 
of and of, the function 

Vn (Xi — fa) 

(^1/ + ^2/)V(^ 1)}* 

will be distributed as “ Student’s ” t with n 1 degrees of freedom. 

(Bartlett, 1937c, and Welch, 19386. The test does not depend on the ratio (Ti/Ua and 
can be extended to the case of unequal sample numbers, but only at the expense of losing 
efficiency in the sense that the degrees of freedom number one less than the lower of the sample 
numbers. ) 

21.8. Given two samples of Ui, rii members from normal populations with unequal 
variances, show that by picking Ui members at random from the (where tig > Ui) and 
pairing them at random with the members of the first sample, a test of significance of 
difference of means can be based on ‘‘ Student’s ” distribution independently of the vari- 
ance ratio in the populations. (This test, again, is exact, but sacrifices the information of 
Ug — ni members of the second sample.) 


21.9. If z is the ratio of the sample mean to sample standard deviation in normal 
samples, and n is large enough for the distribution of the variance to be regarded as normal, 
show that 


V{ <M- 2T» -17) 


18 distributed approximately normally with zero mean and unit variance, where 



7 

32n2 ’ 

(Hendricks, 1936.) 


21.10. If X, y have a continuous frequency function f(x,y), their characteristic 
function is 

v) = \ f exp (iux + ivy)f(x, y) dx dy. 

J — ooj —00 

Show that the distribution of x when y is given has a characteristic function 


f ^ («, v) dv 

^ I y) = 

<l> (0, v) dv 

(Bartlett, 19386.) 


21.11. If a set of parameters 6i . . . d,, admit of a set of sufficient estimators, show 
that conditional inferences independent of . . . 0p are possible, the conditions being 
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that the estimators are oonstmit for the samples concerned. Conversely, if conditional 
inference is possible, the irrelevant parameters must admit a set of sufficient estimators. 

(Bartlett, 1937c.) 

21.12. In a normal sample of n values show that if 

-/ ^ - iCl 

V(2») 

n 

and ns'* = — n*'* = i (*x + a!j)® + ^ x^, 

where Xt, Xt are two sample values taken at random, then 



X 

is distributed in the same form as “ Student’s ” ratio ^ when the t>arent mean is 
zero. Show further that 

1 aC| < 1. 

(Neyman, Lectures and Conferences on Mathematical Statistics, 1938. The example shows 
that if z is " si^ificantly ” largo, Z must be small and hence the two criteria based on z and C 
lead to opposite conclusions.) 

21 . 13 . In a 2 X 2 contingency table, show that the border relative frequencies 
are, on the hypothesis of independence, sufficient estimators for the probability of success 
of the two attributes defining the table. Hence derive the exact test of significance in 
such a table as a conditional inference. (The exact test is given in 12 . 16 , vol. I, p. 303.) 

(Bartlett, 1937c.) 

21 . 14 . If two samples are drawn from a bivariate normal population, and 
are their covariances, Fn and Fm are the variances of the pooled samples, and Fix its 
covariance, show that the distribution function 

F„) 

is independent of the parent variances and correlation. Hence that the distribution 
would provide a test of the difference of sample covariances. 

^ (Bartlett, 1937c.) 

21 . 15 . If two samples Xi . . . x„ and j/i . . . are drawn from populations which 
differ only in location and the difference in means is d, show by considering the values 
typified by « + d and y how to set confidence limits to d, based on the distribution of 
w of equation (21.82). 

(Pitman, 1937a.) 

21 . 16 . In the previous exercise show that the confidence limits for d are the same 
as those based on “ Student’s ” distribution in the case of normal populations with different 
means and identical variances (equation (21.32) ). Explain why the latter test is only 
valid for normal populations, whereas the former is valid for any population. 
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The Analytical Theory of Regression 

22.1 . When considering the theory of correlation in Chaptero 14 and 16 we introduced 
the concept of linear regression of one variate oh a set of “ independent ” variates. We 
shall now study this subject more fully and extend the theory to the case where the regres- 
sion lines are not straight. In the first instance we confine our attention to bivariate 
populations, but the majority of our results are easily generalised to the multivariate case. 

In speaking of one variate as “ dependent ” and the others as “ independent ” we 
introduce what may be a source of confusion. In general, all the variates are dependent 
in the statistical sense, each on the others, and in special cases may even be functionally 
dependent. In selecting one for separate consideration and in discussing its dependence 
on the others we are usually attempting to solve a problem in estimation : for given values 
of the other variates, what is the best estimator of the “ dependent ” variate, or its central 
value in the distribution which it has for such given values ? The idea of “ given ” values, 
that is to say values which can be selected at will, leads to our referring to them as “ inde- 
pendent ”, though they may be statistic ally depende nt, nn nnn another. It might perhaps 
be better to use different wordsTbuF^e practice is so common that we make no attempt 
to improve it. Once the point has been understood no difficulty arises in practice. 

22.2. If we have two variates x, y with frequency function f{x,y), then for any 
fixed value of y the mean of x, say Xy, is given by 

= f xf{x,y)dxl\ f(x,y)dx. . . . (22.1) 

J - QO I J —<*i 

The expression on the right is a function of y and thus the points whose co-ordinates 
are (Xy, y) have a locus which is, in general, a smooth curve. This curve is defined as the 
line of regression of x on y, and may be written 

f xf (x, y) dx 

X = . . . . . (22.2) 

I f{x,Y)dx 

J — oo 

where X, Y are the cufrent co-ordinates. Similarly there will be a line of regression o^ 
y on X given by 

[ yf{X,y)dy 

Y = (22.3) 

I f{X,y}dy 

J — eo 

We shall take Y to represent the dependent variate throughout this chapter. 
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22.3. We may also consider the more general curves typified by 

J —oo 


Y = 


r f(X,y)dy' 

J —GO 


. (22.4) 


the regression now being of the rth moment of y on x. If r =* 1 we have the regression 
of the first moment, or simply the regression. If r = 2 and y is measured from the mean 
we have the so-called seedaatic curve of y on x, 


Y = 


j ^ (y -Vx)^f(^>y)Ay 

i f(X,y)dy 
J —00 


. ( 22 . 6 ) 


which shows how the variance of y varies with x. Other forms which have been studied 
are the clitic curve 


Y = 


and the kwrtic curve 


Y = 


r ^-yx)^f(^,y)dy 

J— 00 _ 

j f{X,y)dy 

[ (y -yx)*f{^>yydy 

J — oo 

^’^J(X,y)dy 


. ( 22 . 6 ) 


(22.7) 


These curves correspond to the moments of a univariate distribution, and the main 
characteristics of a bivariate form may be studied with their aid in much the same way 
as the lower moments can be used to summarise the properties of a univariate form. 

22.4. It is interesting to remark that, just as we can find the moments direct from 
the characteristic function, so also we may ascertain the regressions of moments from 
the bivariate characteristic function, even when the distribution function itself is not 
explicitly given. 

Let us write the frequency function in the form 

/ (». y) = g (*) 9x {yh (22.8) 

where g (x) is the total frequency for any given x and g^. {y) is the frequency of y for any 
given X. In the notation of the theory of probability we should write this 

/{». y) (*)p(y I*)- 

The characteristio function of x and y is then 

^QO ^00 

^ (f 1 , #i) «■ I exp {»«i X ■\-itxy}g (x) g^ (y) dx dy 

J -CO J -00 

= f c'“>*gr(x)^, (<»)]dx (22.9) 

J -00 

where ^ = f .... (22.10) 

J —X 

mad is the c.f. of y for a given x. 
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If the rth moment of y about the origin for a given x is we have 

J /,-0 

and hence, from (22.9), 

• • • (22.11) 

Thus, by the Inversion Theorem, 

9 (*) == ^ ^ (*‘> <») ], ^ ^*1’ • • (22-12) 

subject, of course, to conditions of existence. This gives us the required expression for 
/lyj. in terms of x, and the regression can be written down at once. 


22.5. Since 


we have 


r 3^ 1 . r ^ f (*^ 1 )^) 1 (**i)^ 


ap 

= i <l> (<i. 0) ^ tcji 




( ihV 

'i! 


( 22 ; 13) 


and <l> (<i, 0) may be written <f> (<i), being the characteristic function of g (x). We also 
have, subject to existence conditions, 

^ ^ e-"' •* («,) • . • (22.14) 

Hence, from (22.12), (22.1.3) and (22.14) we find 

- J^te|(--DV !7(*)}, • • • • - (22.16) 

provided that the interchange of summation and integration in the last step is legitimate. 
Thus we have, for the regression of the mean. 


" jr L 9 (x) 


(22.16) 


This notable result is due to Wicksell (19346). The expansion is valid if the cumulants 
exist and if g (x) and its derivatives are continuous in the range and zero at its extremes ; 
for then the interchange of summation and integration in arriving at (22.15) is Idigitimate. 
In particular, if g (x) is normal and in standard measure we have 


Y S ^ Hi [X), 
3 * 


(22.17) 

where Hj (x) is the Tchebycheff-Hermite polynomial of order j (6.20, vol. I, p. 146). 
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Examjiie 22.1 

"Foe the bivariate normal distribution about the mean we have 


Hence 

and from (22.12) 


(^ i > ^*) ~ { — i ^1 ^2 ^1 + <^2 ^ 1 ) }• 

r^l = - /wTi <?• <1 exp (- i<Tf ^), 

• aqo 

g {x) /I'tg — ™ j poi at h exp {- <f — »«j x) dti 


xe a®?* 


Hence 


and 


aW{2n) 


pat 

Vi 

Y^^X, 

OTj. 


the familiar relation of linearity for the regression of the mean of the normal distribution. 
Alternatively, direct from (22.17) we have, since k^i z= 0, j > 1 

I = +^L*jy, (X) 

OTj CTi 


Y ^^X, as befoi 
0*1 


/// 

ira 


Example 22.2 (Wioksell, 19346) 

Clonsider the frequency distribution of f = {**) and jj = Ji7 (y*) where x, y are 

samples of n from the bivariate normal population 

dF oc exp — _ ^ i,- {** — 2pxy + y*} dx dy. 

^ (1 — p») 

The characteristic function is 

^ oc [I I exp (i®* fix + iy* fl.) =“ |(1 -0i)(l -0.) 
where Ot => ih and 0. = itt. 

The distribution Unction cannot be expressed in a simple form, but we may determine 
the r^;ressions without it. We have 


Thus, from (22.12) 
g(i)/*rt 


+ r - 1 ‘ >' 

L3ed».-o ; (i_e,)*»+r 

2» ]_» (i-fl;)*»+r 
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The integrals may be evaluated by successive application of 

J_r = 

2jtJ_«(i-fl)* r{ky 
and we find, for the regression of rj on 

flu = - (/“u)* 

= (i -p*) + Vf}- 

Thus the regressions of both mean and variance of 9 ^ on f are linear. 


Fitting of Curvilinear Regression Lines 


22.6. From the practical point of view the case we have just considered, namely, 
the one where the distribution or characteristic function is given, is exceptional. The 
determination of regression curves has, in the majority of cases, to be carried out from 
numerically specified material, which we shall consider in the remainder of the chapter. 
We shall confine our attention to the regression of the mean. 

In general the means of arrays will not lie exactly on a smooth curve (unless of course 
we choose a curve of order equal to the number of points to be fitted, less one). Nor do 
we know a priori what is the appropriate degree of a polynomial which will approx- 
imately represent the regression line. Let us, however, assume that the regression can 
be represented by a polynomial of order p : 

Y = ^0 + (Xi JC -h Og 4" . . . 4” Op . . . . (22.18) 

We will consider later how the appropriate value of p is to be determined in particular 
cases. Our problem is to determine the coefficients a from the data. As usual, we appeal 
to the principle of least squares, that is to say, we find the values of the o's which will 
minimise 

U = 27 (y Oo — Oj a; — . . . — x^)^, . . . (22.19) 

the summation extending over the sample values. 

Differentiating with respect to Oy, we have 

--aoLxf - . . . = 0 , 


and similar equations for jf = 0 , . . . ^. Writing the moments without primes for sim- 
plicity and letting represent the jth moment of x, and the bivariate moment 
27 {of y), we have 


Writing now 


Of, Hi + <*1 (i\ 

+ . 


fflo fix + Ol fit 

-H . 

• • +®pi“p+i ~A*»i 

tto Pp + + 

. . . + Op — flpl 


fJiQ 

fix • • • flp j 


P% ... Pp^l 



. • • p^p i 


. (22.20) 


. ( 22 . 21 ) 


A.S. — ^VOL. II. 


L 
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and for the determinant obtained by substituting the produot-moments /toi> •••/*]« 
for the {j H- i)th column, we have, as the solution of (22.20), 

(“-22) 

22.7. It might appear that this solution could break down if d^^ <=> 0. Such a 


dip) 


or, if 


j(p) 


e(»), 

we have for 

^(p) 



1 

ar. 

A 

. . . X^ 

f 

Xi 

xl 

xl 

. . . X^+^ 

• • J 

k 



. . . a?j, 



i 1 

*0 

• • • 


D = 

1 1 

= j 


. . . 

= II 


i i 

Xj, 


. . . J 

[ A A 

x\ . . 

. X^ D dOa 


dGodGi . . . dOj, 


dO^. 


If we now permute the suffixes of the x'a in all possible ways and sum the (p -f 1) ! resultants 
we obtain, in virtue of the definition of a determinant. 


(p + 1) ! d^'> = II. ..J D^dGodO^ . . . dO^, 

and hence is essentially positive. 


. (22.23) 


Y 

1 

X 

. . . X" 

Poi 

Mo 

Ml 

• • • f^P 


Ml 

M* 

• • • 

1 Mpi 

Mp 

Mp+i 

• - • fX2p 


22.8. From (22.18) and (22.22) we see that the regression line may be written 


= 0 . . (22.24) 


This is a formal solution of our problem. The moments fx can be obtained from observation, 
and equation (22.24) then gives the regression line. 

It will be observed that in order to preserve the symmetry we have written ^Uo for 
the total frequency unity. 

22.9. A somewhat different approach leads to the same solution. K we assume 
that the regression line is a parabolic curve of order p, we may find the coefficients by the 
principle of moments. This would lead us to identify the lower moments 

27 y) = 27 (ao + «! a: + . . . + a**) 

as far as was necessary to determine the a’s. This clearly leads back to equation (22.20). 
Orthogonal Polynomials 

22«10. The use of equation (22.24) in practice is subject to one serious drawback. 
If we have a set of data and no guide, apart from inspection, to the appropriate value of 
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p, the only course is to fit curves of order 1, 2, 3, . . . and so forth, until we reach the point 
when further terms do not improve the fit. Every time we add a new term the determin* 
antal arithmetic has to be done afresh. To obviate this nuisance we shall consider the 
regression line in the form 

y = 6„Po + + . . . + fipPp (22.26) 

where the P’s are polynomials in X, Py being of degree j. We shall determine the P’s 
so that 

2’(P^P,)=0, . . .(22.26)^ 

the summation extending over the observed values. 

In minimising 

2 {y —• bo Pq — bi Pi . . . — Pp)^9 

we shall have equations such as 

i:(3/P^)-6o2:(PoP^)-. . .-6p2:(PpP,.) = o, 

and in virtue of the orthogonal relations (22.26), this reduces to 

2:(?/P,.) ^6^i:(P?) = 0 (22.27) 

Thus is determined simply by P^ ; and if, having fitted a curve of order p, we wish to 
go a step farther and add a term bp^ i Pp+i, the coefficients fe# . . . bp found from (22.27) 
remain unaltered. 

22.11. Furthermore, the use of these orthogonal polynomials will give us a very 
convenient method of determining step by step the goodness of fit of the regression line. 
We have 

U ^Ky-^boPo-. • ^-bpPp^ 

= S (y*) - 26, X (yP.) - . . . - 26p r {yP„) +blX (P;i) + .. .+bl 2 (P®). 

But from (22.27) we may express X (yPj) in terms of X (P’j), and we thus find 

U==X (y*) - 6* X (Pg) . .-blX {PI). . . . (22.28) 

Thus the effect of any terra bj Pj is to reduce Uhyb^jX (Pj) and we may examine the effect 
of this term on U separately. If we find that the addition of any term bp Pp does not 
reduce U significantly, we may conclude that it is redundant (so far as concerns the 
representation of a regression line by a polynomial). 

22.12. We proceed then to derive expressions for the orthogonal polynomials in the 
general case. Later we shall examine the important special case when the values of x 
are equidistant (as, for instance, with grouped .data and most time-series). 

Put 

V 

(22.29) 

i-o 

In this expression there are (p + 1) unknown constants c, and hence in all the polynomials 
up to and including those of the pth order there are |( 2 > + I) (p + 2) constants. The 
orthogonal relations up to and including order p will then provide ip (p + 1) conditions 
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on the c’b, so that p + 1 oonstemts are assignable at will. We will take one for each P and 
assign it so that the coefficient of in P^ is unity : 

Cjj = 1. • • • • • ’ . (22.30) 

In particular Coo = Po = The orthogonal relations are then just sufficient to determine 
the other c’s. For instance, for the set Cpp J = 0 . . . p — 1, they are 

PPpP, =i:Pp =0 
PPpPi =0 

and so on. This system is clearly equivalent to the p equations 

PPp =0^ 

IxPp _0l (22.31) 

Zaj^-iPp =oJ 

Oh substituting for the P’s from (22.29) we get 

+ ^pl f^l + ' ’ ' + +/<p =0 

®p0 + Cpi /i2 + • • • + Cp,p-1 f^p + f^p+1 — 0 


®p0 l^p—1 "t” ®pl f^p "f" • • • "H ®p, p— 1 /*2p-2 "t" A*2 p— 1 


The solutionmay be expressed as a determinant in the usual way. Writing in accord- 
ance with (22.21) and for the minor of the term in the last row and {j H- l)th column 
in (22.21), we find 


Cpj — 


J(p-X)‘ 


. (22.32) 


This expresses the c’s in terms of the ascertainable constants /i. It follows that 


p — — 

^p ^(p-i) 


; Mo 

Ml 

■ ■ Mp 

Ml 

Ml 

• • • Mp+t 

Mp—i 

Mp 

• • • M2p-l 

1 

X 

. . . X” 


. (22.33) 


We notice in particular that, in virtue of the diagonal symmetry of d*"’, we have 


<^jk — • 

22.13. In virtue of (22.31) we have 

r(p*) = r(*»’Pp) 

and thus, from (22.33) on multiplying the last row and summing, 

y (P*) — 

y It, p \ 


(22.34) 


(22.35) 

(22.36) 


Similarly 
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Finally, firom (22.27) 



. (22.37) 


Our problem is now solved. We have expressed all the unknowns in terms of 
calculable determinants. 

We may note in passing that since the regression equation must remain covariant 
under a change of origin, all the coefficients b except (ure seminvariant, and the origin 
can thus be chosen at will. 6, itself is the mean of the y-values. 

22.14. Explicitly for the polynomials we have (taking /ii — 0, fit = 1) — 


P. = 1 


. (22.38) 


Pi = 


1 0 
L ^ 
1 




p.= 


1 0 1 

0 1 fit 

1 X x* 
~1 0 

0 1 


= JC* - - 1 


1 0 1 //3 

I 0 1 fit fit 

■ 1 fit flA fit 

j 1 0 1 

I 0 1 fit 

I 1 


. (22.39) 


. (22.40) 


= ^2 “ 1 {(/*4 -i“3 - 

A *« /*8 ^ 

+ {fit "1" — 2/44 /<s + /*:()} • • (22.41) 

and so on. In particular, if the population is normal, 

Pt=X 

P, = - 1 

Pt = X*- 3X, etc., 

the polynomials in this case reducing to the Tohebycheff-Hermite functions (6.20) which 
we ^ow to form an orthogonal 8 > 3 rstem in the normal case. 


Example 22.3. Ungrouped Data 

Table 22.1 shows the relationship between the percentage loss in weight (7) and the 
temperature {X) in a number of samples of soil. We require to find the regression of 7 on X. 
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TABLE 22.1 

Fitting of Curvilinear Regression for Ungrouped Data 
(Data from J. B. H. Coutts, J. Agr. Sei., 20, 641.) 


Percentage Loss < 

in Weight. i 

Y i 

Temperature 
(degrees F.). 
X 

3-71 ’ 

100 

3-81 

105 

3*86 ! 

110 

393 1 

115 

3-96 

121 

420 

132 

434 

144 

4-51 

153 

4*73 

163 

5*35 

179 

6-74 

191 

614 

203 

6*61 

212 

6-98 

226 

744 

237 

7-76 

251 


For the sums required we find — 

n = l6,S (y) = 82-97, £ (y*) = 459-4363 ; 

£ (x) = 2642, £ (**) = 474,050, £ (x») = 91,244,582 ; 
i:(x*) = 18,553,164,842, 27 (x®) = 3,930,294,226,302; 
£ (x«) = 868,077,668,7^,260 ; 27 (yx) = 14,736-19 ; 

27 (yx*) = 2,819,909-46, 27(yx») = 671,902,362-11. 


These can be run off fairly quickly on a machine. We hare not bothered to take a different 
mean from those giren, but in general a certain amount of arithmetic can be saved by 
so doing. 

Considering first of all the straightforward approach of (22.24), we have for the straight 


line of closest fit. 


Y 1 X 

82-97 16 2642 

14,736-19 2642 474,060 


= 0 , 


reducing to 


Y 


= 0-660 + 2-741 



. (22.42) 


We have put n/tf instead of ft^ in the second and third rows of the determinant, as we are 
clearly entitled to do. 

Similarly we find for the second- and third-order parabolas — 

r - 3-5ei - 0 92. (^) + l.(W0 (^)’ (22.43) 

r - - 8.940 1 ^) - 3 875 (^)‘ - 0.9189(A)’ . 


- (22.44) 
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Pig. 22.1 shows the straight line and oubio fitted to the data by these means. An examina- 
tion of the coefficients in the equations illustrates the point made above, that as successive 
terms are added to the polynomials the coefficients of all terms may alter very considerably. 



Fia. 22.1. — Straight Line and Cubic Parabola of Closest Fit to the Data of Table 22.1. 


Consider now the alternative approach by the use of orthogonal polynomials, 
the use of equations (22.33) we have 


By 


Px = 


16 2642 

1 X 
= X - 165 126. 


/ 


16 


P» = 


P,= 


16 

2642 

1 


2642 

474,050 

X 


474,050 

91,244,682 

X* 



X* - 343 137X + 27,032-436. 


16 2642 

2642 474,060 


16 2642 

2642 474,050 

474,050 91,244,582 

1 X 


474,050 

91,244,582 

18,653,164,842 

X* 


91,244,682 

18,563,164,842 

3,930,294,226,302 


X» 


I 16 2642 

I 2642 474,060 

I 474,060 91,244,682 


474,060 

91,244,682 

18,553,164,842 


X* - 622-940X* + 87,182-434X - 4,606,047. 


The 6-coeffioients are given by (22.37), the determinants in the numerator having been 
already tabulated in finding the P’s. We have 




6-1856, 




2-7409 
100 ’ 


6 . 


1-0696 


0-91889 


100 » ’ 


100 » ’ 
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these being the values already found in arriving at (22.42) to (22.44). Thus 
Y = 6-1866 + (X - 166126) + (X* - 343 137X + 27,032-4) 

_ (X» - 622-940X* + 87,182-4X - 4,606,047). . . (22.46) 

100 ® 

Tf we stop at the second term we have 

Y = 6-1866 + (X - 166-126) 

-0C60 + 2-741 

which is the same as (22.42), as of course it must be. Similarly, if we stop at the third or 
fourth terms we find equations (22.43) or (22.44). 

Now consider the fit of the regression line. We have from (22.35), 

b% S (Pt) = n bl =6^ X (FPp). 

The determinants in this expression have already been evaluated in finding the regression 
line. Remembering that 2* (y®) = 459*436 we obtain the following : — 


0 

1 

2 

3 



1 dO') 

; U (eq^tion (22.28) ). 

. 6;. 

^ 6^.3 2^-^. 

51856 

I 430*247 

i 

1 29-189 

2-7409 X 10-** 

j 28-390 

, 0-799 

1-0695 X 10-^ 

0-669 

I 0-130 

- 0-91889 X 10-« 

0-080 

1 0-050 


In calculations of this kind it is as well to take bj to an extra place of decimals, as the value 
of {7 is rather sensitive to small errors of rounding up. Even so, the last figure in {7 is 
unreliable. 

From the values of U it is clear that the fit is greatly improved by taking a quadratic 
term, and still further improved by adding the cubic term. How far a quartic term would 
improve matters cannot be decided without ascertaining the term. We have, however, 
not proceeded beyond the third degree because to do so would require moments of the 
eighth order. For a small population such as this, which in practical applications would 
be considered as a sample only, the errors in higher moments would probably be considerable. 

The reader who works through the arithmetic of this example will find that there is 
about the same labour involved in either method. It is in the fitting of higher order terms 
that the method of orthogonal polynomials shows its superiority. In practical cases it 
is preferable to avoid the large numbers arising from the evaluation of determinants by 
a modification of the procedure given in 22.27 below. 

Example 22,4. Gfrouped Data 

In Example 14.1 (vol. I, p. 331) we considered the correlation between age and highest 
audible pitch in 3379 subjects and found the linear regressions. Let us take the work 
a stage further. 
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STANDAED ERRORS OP REGRESSION COEFFICIENTS 

For the data of the table {X — age, F = pitch) we find — 

^(y) = - 708 ; E (y*) = 8894 ; E (yx) = - 12,636 ; 

E(x) = 2604 ; E (a:*) = 47,392 ; E (x*) = 387,498 ; 

E (a^) = 4,842,172 ; E (a*) = 62,401,794 ; E = 883,576,012. 

As a variation on the procedure of the previous example, we will convert these figures 
to moments about the mean (with Sheppard’s corrections) and put them in standard measure. 
We find— 


= - 0-209,629 ; = 2-604,904 ; 

= 0-770,642 ; = 13-348,229. 

In standard measure the other moments are 

jM, = 1-705,375 ; /a* = 6-295,759 ; 

/H = 20-729,861 ; //* = 78-409,775. 

We may now use equations (22.38), etc., direct, and find 

Po = 1, Pi = X, Pi = X* - 1-706X - 1, P, = X* - 3-471A:* - 0-376A: + 3-660. 

We now require the moments fin and We find 

E{yx^) = - 112,495 
E (yx^) = - 1,399,639, 

and hence, with Sheppard’s corrections and in standard measure, 

fiti === - 1-177,920 /jiii = - 4-216,958. 

We now find, from (22.37), 

6 « = 0 

6, =- - 0-613,626 

5, == — 0-066,064 

6, = 0-010,205. 

The regression line of the third degree is then 

r = _ 0-6136A: - 0-0551 (X* - 1-705X - 1) + 0-0102 (X» - 3-471X* - 0-376X + 3-660), 
where the origin is at the mean and the units are in standard measure. 

StaTH^rd Errors of Regression Coefficients 

22.15. The standard errors of unknowns derived from least squares can be found 
by the use of a result due originally to Gauss. Suppose oq is the true value of and the 
residuals y — E<Xj3f arc distributed normally with variance v. Writing da^ = etj — 
we have for the frequency function of the residuals — 

oc exp — ^y — + EE 
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iE denoting summation over the sample and S over the values Oo to a^, and the cross* 
\« i 

term vanishing because the a’s are minimal values) ; 

oc constant x exp — ^ E E (do^ 


2» , f 

oc exp —'^EE {dOj da/c 

2t> a i,k 

OC exp - ^ 2 : (d% do* //,+*). 


(22.46) 


In the limit, then, the deviations are distributed in the bivariate normal form, and from 
the results of 15.12 (vol. I, p. 376) it follows that 


var o^ 


V 


(22.47) 


for the determinant whose terms are is in fact the determinant we have already defined 
as id****, and djj* is the minor of the item in the jth row and column. 

Now V is the variance of deviations from the theoretical regression line, and in terms 
of variations about the observed line we have, remembering the result of 18.17 — 

v.ra,=^.- , (22.48) 

^ J(l») » — P — 1 

Since the correlation ratio of y on a; is given by 

var c = var y (1 — >?*), 

we have also 


var 


a = ^ (1 - r]*)yary 
^ j(p) n —p — I 


(22.49) 


For large samples the replacement of » by n — p — 1 in the denominator is an unnecessary 
refinement. 


22.16. For the case of orthogonal polynomials the results apply with a slight but 
important simplification. The coefficient bf is the same as if polynomials up to order j 
only are fitted, and hence, since we have 


_d«-«(l -»?*)vary 
n-T-l ■ 


(22.60) 


(22.51) 


The same result follows by modifying (22.46), which for orthogonal polynomials becomes 

/ oc exp — JEP^ (db/)*l 

2V j \a J 

showing that the b’s are independently and normally distributed with variance 

var bj 

reducing to (22.60) in virtue of (22.36). 


EP¥ 


22.17. If the parent population is normal, ri =‘ p, and the determinants can be 
evaluated explicitly in terms of the variance of x. In fact, 

jw-i) 1 

'W ^j\(yacxf . • . • • (22.62) 
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tod hence 


or, in standard measure, 


var hj 


^ (1 — p^) vary 

n — j — 1 j\ (var x)^ 


var 6 ^ 



. (22.53) 
. (22.54) 


Equation (22.52) can be found by evaluating the determinants in the ordinary way, but 

it follows more simply from the consideration that is equal “ - 2 * which, in the 

normal case, is for large samples equal to E (P|) = J ! (var x)^ (6.22* vol. I, p. 147, with 
a change of scale). 


22.18. The advantages of using orthogonal polynomials instead of powers of X 
are apparent in the forms taken by the standard errors of the coefficients a and 6 . The 
latter are independent of the order of the polynomial fitted and can be tested once and for 
all. The former do not possess this advantage. It seems preferable, therefore, as a matter 
of technique, to work with orthogonal polynomials throughout, whenever regressions of 
order higher than the first are likely to require investigation. 


Example 22.5 

Consider again the data of Example 22.4 (regression of highest audible pitch on age). 
We have there expressed the regression line in standard measure and in the orthogonal 
form, and may therefore use equation (22.60) in the form 

1 ^^ 2 ^( 0 ) 

var = L 

n 


var 6 , = 


n 


var 6 . = 


n 




(The sample number n is so large that we can ignore the element — {j + 1) in the divisor.) 
The determinants required are already known, having been ascertained in the course of 
the work. We have 




j(i) 


= 0-4189, 


J(2) 

J(3) 


0-0986. 


We also require rj, which was found in Example 14.11 (vol. I, p. 362) to be = 0-6231. 
Thus 1 - = 0-6117. We find 


var bi 


1-8104 
10 ^ ’ 


var 6 * 


0-7684 
10 ^ ’ 


var 63 = 


0-1783 
10 ^ ‘ 


The values of the 6 ’s and their standard errors are then 


Order, 

b. 

Standard Error. 

1 

- 0-6136 

0-013 

2 

- 0-0561 

0-0087 

3 

0-0102 

0-0042 
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In all oases we should judge the coefficients significant, as being more than twice the standard 
error. Although, therefore, the second* and third-order terms are small and the regression 
is approximately linear, the deviation from linearity is not merely a chance effect. 


SxwA Signijkance Teats in the Normal Case 

22.19. When the parent population is normal, more exact tests than those derived 
from the use of standard errors may be obtained. We have already seen (14.21, vol. I, 
p. 348) that a function dependent offiy on sample values and the first regression coefficient 
hi was distributed in " Student’s ” form. We proceed to generalise this result. 

Consider in the first place the linear regression equation 


r = y + 6,(X-f), (22.66) 

and let fit be the population value of hi and ar| the variance of y in the population. Since 
the parent is normal, the variance of y for any fixed value of x is (r|. 

Our estimate of 6^ is 

= 

where summation takes place over the sample values. Thus for fixed values of x we have — 


var 6i 


S{x — x)* var y 


ol 

S(x — »)*' 


. (22.57) 


Thus, since the mean of the distribution of bi is /3i, we see that, for samples having the 
same x’a as those observed, hi is normally distributed about mean j9i with variance given 
by (22.67) — ^normally because it is a linear function of the y’s which are themselves normal. 
Consequently, 


(^1 -/?.) V2^(x -x)» 


. (22.58) 


is distributed normally about zero mean with unit variance. 

If Of were known this would provide a test of significance of &i in the ordinary way ; 
but in fact Ui is not known and the substitution of an estimate distributed in the Type III 
form brings in the t-distribution in the usual way. We take as our estimator of ot the 
function s, where 

s»==^ i:(y-rr (22.69) 

n — 25 


amd F' represents the values “ predicted ” by the regression line, that is, the values 

F' -hi(x -X). . . . . . (22.60) 

Thus s* is based on the sum of squares of residuals. We shall show presently that a* is 
distributed in the Type inform with n — 2 degrees of ficeedom independently of 6i — jSi. 
It follows that 

. (6i - fit) Vi^(* - 5)* V(n-2) 

V2;(y-Fr 

is distributed as “ Student’s ” t with r = n — 2. 

A given value fii may be tested accordingly. But we notice that the inference is a 
conditional one, that is to say, we are considering the distribution of t in a sub-population 
for which the x’s are the same as those actually observed. (Cf. 21.47.) 
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22.20. To establish the foregoing result we have to show that £ {y — Y')*, the sum 
of squares of residuals about the observed regression line, is distributed in the Type III 
form with n — 2 degrees of freedom. This is a particular case of a general theorem we 
shall prove at the beginning of the next chapter, but we will sketch an ad hoc proof hero 
for the sake of completeness. 

Since the population is normal, the deviations of y from the true regression line for 
fixed a;’s, Y = j5o + (.X — x), where /?, is the parent mean of y, is normal with variance 

of. Now 


-2)-i = 


-E(y-Yr=^±£{y-b,-h,(x 

Oo (To 


X) }* 


= 4 — /Si (a: — *) — (6* — /?,) 


{bi -Pi)(x -*)}*. 
The coefficients 6© and 6i were chosen so as to minimise this sum, and hence 


{n-2)l^=±£{y~fi,-P,(x-x)}» 


n 


(6. - i?.)* 


(6i 


{x - x)*. (22.61) 


The first term is the sum of squares of n normal variates with zero mean and unit variance ; 
the second is also such a variate, for it is the square of the deviation of the mean of y about 
its true value divided by the variance al/n ; and the third term is also such a variate, as 
shown above. 


It does not follow immediately that 


(n - 2) 


al 


is distributed as the sum of squares of 


n -- 2 normal variates in standard measure, for the constituent items might be correlated. 
Let us then find an orthogonal transformation to new variates Si ••• in linearly related 
to the n normal variates y — Pq — fit {x — x). These also will be normally and inde- 
pendently distributed. In particular (remembering that our summations refer to the 
y's and a;’s, but the latter are constant for our distributions), take 

= (*-*)} 


o^y/n 

= X^(6,-^o) 

(7a 


= A r r 

L 




/3i) \'£ (x - J)*. 


-*) }] 


and fa normal variates in standard measure. Moreover they are orthogonal since 




X ~ X 


a\ y/H {x 

= k E {x — x) 

= 0 . 




n 

Consequently our transformation exhibits the first term on the right in (22.61) as ^ if and 




the second and third as if and if. Thus the total is distributed as ^ if, which is the 
result required. 


i-s 
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We may oompaie the result of 18.17 — ^in which we saw that the mean veAue of e* 
was n, whereas that of e* was n —jp — \, one d^iee of freedom having been lost ih the 
sum of squares of residuals for every constant estimated — and the approximate result of 
21.20 in which ;(*.had to lose a degree for each constant fitted by maximum likelihood. 
Fundamentally all these results are different aspects of the same thing and rest on the fact 
that the variation of the sum of squares of normal variates in standard measure is spherically 
symmetric, so that a hyperplane in the sample space “ cuts ” the distribution in a spheri- 
cally symmetric form of one lower degree of fi^dom. 


Extension to Curvilinear Regression 

22.21. The foregoing result can be extended without difficulty to the case when 
the regression is curvilinear. If 

r = 6o Po + Ri + • . ‘ + bp Pp, 
where the P’s are orthogonal, then 

* 2PI ’ 

and we have also, for the variance of when the x’a are fixed, 

. of 

SO that 

yrp? 

is distributed normally with zero mean and unit variance. Taking as our estimate of a% 

^ Siy-Yr, 


a* = 


we see, as before, that 


n —j ~l 


. (22.62) 


is distributed as “ Student’s ” t with v = n — j — 1 degrees of freedom. 

It will be observed that in this and the previous section we have not assumed anything 
about the distribution in a;-arrays. We have merely supposed that for any given a:, y is 
normally distributed with constant variance. 


Example 22.6 

C!onsider again the soil data of Example 22.3. We found, for the cubic term in the 
parabola, a coefficient of — 0*9189 x 10~*. Is this significant ? 

Here bj — = — 0*9189 x 10“* for J = 3 ; 

V(n -j -1) = V(16 - 4) = 3*464. 

We have already found E{y — Y')* = U, namely 

V = 0*060. 

We further require E P| which has been obtained incidentally in the working of Example 
22.3 and is equal to 9*31625 x 10^<*. Hence 

_ 0*9189 X 10-» (3*464) 3*062 X 10» 

0*2236 

= 4*3. 

This is highly significant. 
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CASE OF VARIATE WITH EQUIDISTANT VALUES 

Oa^e .when the Independent Variate proceeds by Equal Steps 

22.22. An important special case arises when the independent variate has values 
which are equidistant, as, for instance, in most time-series and in grouped data. If we take 
the interval between successive values of x as our unit, the variate-values may, by a suit- 
able choice of origin, be taken as 0, 1, 2, . . . ri — 1. The various moment-functions 
entering into the expressions for poljmomials, etc., may be written down once for 
all. Furthermore, this case lends itself to simpler summatory methods of forming the 
actual polynomial values and the residuals. 


22.23. For a set of values 0, 1, 2, ... n — 1, we have 
E (®) = E (**) = ~ 


etc. 

4 


- 1 


i (» — 1 ). = “l2~’ ^ 


From (22.38) and similar equations we then find 


(22.63) 


p X* (It— X/Xt — fil pg »* — 1 f 

P. J 

and so on. The polynomials may be obtained more systematically as follows : — 

We show first of all that 

where is the jth terminal difference of and the range from 0 to w — 1. In fact, 
from Newton’s interpolation formula, 




and since the P’s are orthogonal, 

X{x + q — Pp = 0, g <p. 

X 

Substituting from (22.65), we find for the term in Pp — 

E{x+q- ~Pp = X{(x-\- qfo^i) -{x + q- ^ Pp 


. (22.65) 


( 22 . 66 ) 


Thus for all q from 1 to p we have 


' "O' + gli! ” 


^ (n + g- At p 
(» — !)! \ j ) j + g *” 

whence follows (22.64). We now find functions obeying these conditions. 
Consider 


g-l)!„/«-l\ /ii p ^ 


y = C {x + p)^K 


. (22.67) 
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Thisisapolynomialofdegree j>, andiffora; = 0, 1, . . . p it aasomes the values 
we have — 


y (*) = <7 »!»>+« ^ ■\ 

-J)^x -J) 


( 22 . 68 ) 


for this also is of degree p and has the right values at x = 0, . . , p. Taking now 


= (» - 1) t (P -i) i (_ ijp-# p 


. (22.69) 


we find that for a: = — g 


y - o (P + yP«> (- ^ 


. (22.70) 


Now from the definition of y this clearly vanishes for — a; = g = 1, . . . p, and* thus 
(22.70) is zero. Comparing it with (22.64) we see that the conditions are satisfied if we 
give to the value of of (22.69), i.e. 

, . . ( 32 . 71 , 

The constant O is evaluated by the fact that the coefficient of X** in Pp is unity, giving 
d" Pp = p ! This gives 

(7 = P_':_ . .... (22.72) 

(2p)! (»-p-l)! ^ ^ 

Finally, substituting in (22.65), we find 

^ + (22.73) 

where by convention the term is unity for J = 0. The first six polynomials are 


P. = P! 


p _ ps „ p 

P - P 4 _ 3n* - 13 p. , 3 (a* - 1) (^- 9) 

p _ ps _ 8 {»* - 7) p, , 15n* - 230»* + 407 p 
* 1 18 I + 1008“' ’ ^ 

p _ JM 8 “ 31) P4 j 8»* — 110»* + 329 pg 

p, _p, p, + p, 

6 (n* - 1) (ra* - 9) (n* - 26) 

14784 


(22.74) 
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Four more values are given by Allan (1930), to whom the above derivation of (22.73) 
is due. 

Values of the polynomials up to and including the fifth are given in Fisher and Yates’ 
Statistical Tables up to w = 52. 


22.24. We can now find an explicit expression for Since the polynomials 

are orthogonal we have 

which, by the argument resulting in (22.64), leads to 


P* _ ^ (« + P)'- p 

Z. - / - 1) ! p ^3 + 1 *'• 


i-0 


Putting g = p + 1 in (22.67) and (22.70), we have 


y(-q)=:C( - l)f^»l = (- 1)^^ (2p -f l)^^+*i 

whence, after a little rearrangement, 


SC7‘)f 


p 

+j + 1 


£ (” + y ) ! 

jl (n -j - 


^ (P !)* (n + p ) ! 

l)!l>+i + l (2p + 1)! (n - 1)! ’ 


and thus, substituting for C from (22.72), we find 




(P n In^ — 1) 

(2p) ! (2p + 1) ! ^ ^ * 


(n2 -- 


. (22.75) 


22.25. 

differences. 


It is also possible to express the orthogonal polynomials in terms of central 
We quote without proof the results (for details of which see Allan, 1930) : — 


where 




Pi 

(p-~W 


[M'Pi^ 


(- iy(p - j -i)i 

(p - 2j ) ! j ! 2-^J 


[PjP-2y-i 


[xY = 


{a;+ ^(w — 1 ) } ! 
{a: — i (» - i)}! ■ 


(22.76) 

(22.77) 


The series is summed from j — 0 until 2j > p, when the denominator vanishes and (jp — i) ! 
is written for r{p + J) to preserve the factorial notation. In practice the polynomials 
for particular examples are not determined from (22.73) or (22.76) but by the use of tables, 
or by summation from differences in the manner of Example 22.9 below. 


Example 22.7 

For the fitting of a regression line in the case of equidistant intervals various methods 
are in use. A choice between them depends on the length of the series, the order of regres- 
sion to which it is desired to go, and the computing resources at the investigator’s disposal. 
We will illustrate two methods in this and the next example. 

M 


A.S. — ^VOL. U. 
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TABLE 22.2 

FitHng of BegreasUm Lint by Orthogowd PolynomiaU — Equidistant x~iiUervcds. 


(1) 

Year. 

(2) 

Variate. 

Px 

(3) 

Population 

(million)! 

Y 

. (4) 

p. 

(6) 

JP. 

V 

(6) 

T^Px 

1811 . . 

- 6 

1016 

22 

~ 11 

99 


1821 . . 

-- 5 

1200 

11 

0 

- 66 


1831 . . 

- 4 

13-90 

2 

6 

- 96 


1841 . . 

- 3 

16-91 

- 6 

8 

- 64 


1851 . . 

- 2 

17-93 

- 10 


11 


1861 . . 

- 1 

20-07 

- 13 

4 

64 


, 1871 . . 

0 

22-71 

- 14 

0 

84 


1881 . . 

1 

26-97 

- 13 

- 4 

64 


1891 . . 

2 

29-00 

- 10 

- 7 

11 


1901 . . 

3 

32-63 

- 6 

- 8 

- 64 


1911 . . 

4 

36-07 

2 

~ 6 

- 96 


1921 . . 

6 

37-89 

11 

0 

- 66 


1931 . . 

6 

39-95 

i 

22 

1 i 

11 

99 

1 



In Table 22.2, column 3 shows the population of England and Wales (in millions) 
for the years shown in column 1. These are at ten-yearly intervals, and the variate-values 
in units of 10 with origin at the mid-point of the range are given in coliunn (2). These 
ate the values of Pi. 

The corresponding values of Pj, P, and P4 are given in the last three columns. They 
may be calculated direct from (22.74), but are most conveniently taken direct from the 
Fi^er-Yates tables. 

We find, for n = 13, 


E YPi = 474-77 
E YP, = 123-19 

E YP, = - 39-38 X 6 = - 236-28 
E YPt = - 374-30 X = - 641-667,143, 


and, direct from the tables, 

EPl = 182, EPt = 2002, EPl = 672 X 36, 

E PI = 68,068 X (-V*^)». 

y/ 2J YP 

Hence, from equations of the type bi = 

bi = 2-608,626, 6, = 0-061,633,467, 6, = - 0-011,474,369, 64 = - 0-003,207,699 

and the quartio curve is 

y Y - 24-1608 = 2-6086JC -+• 0-061,63 {X* — 14) - 0-011,47 (Z» - 25 X) 


- 0-003,208 ^ X* + 144^ 


(22.78) 


We can now find the residuals for each term in this equation. We find 

X r* = 8839-9389 
EY =314-09. 
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Hence the sum of squares of Y about the mean of Y, 

- r)* == 1261-283. 


Thus we have : — 


"'I 

Residual Sum of Squares. I 


j Original variation I 1251*283 

I Contribution of first term == bi 2:(YP^). . . . 1238*497 

I Contribution of second term = 63 £ (YP^) . . | 7.58O 

I Contribution of third term = 63 2] (YP^) . . ■ 2*711 

I Contribution of fourth term = Z (YP^) . ! i 2*068 

: i 


12*786 

5*206 

2*405 

0*437 


For the variance, of the residual elements we divide by the number of degrees of freedom 
{n — j — 1) and obtain 


Residual Sum of Squares. 

Divisor. 

Residual Variance. 

1 

12*786 

11 

i 1162 

5*206 

10 

0-621 

2*495 

9 

! 0-277 

0*437 

8 

j 0-055 


Fig. 22.2 shows the data graphically with the cubic and quartic of closest fit. 



Fio. 22.2. — Cubic (full line) and Quartic (broken line) Parabolas fitted to the Data of Table 22.2. 


The fit is evidently a good one, as is borne out by the smallness of the residual variance, 
but we must sound a warning as to the use of this polynomial. For interpolation in the 
variate range it would probably suit very well ; but for extrapolation outside the range 
it is dangerous unless there is good reason to suppose that the polynomial has some theoretical 
basis (which is not so). It would, for instance, be most unsafe to try and estimate the 
population in 1900 by inserting X = 9 in equation (22.78). 



164 


REGRESSION 


Example 22.8 

In Chapter 3 it was seen that factorial moments can be derived by summatory pro- 
cesses. A somewhat similar method can be used to fit orthogonal polynomials. We will 
illustrate it on the data of the previous example. 

TABLE 22.3 

Fitting of Orthogonal Polynomials by Factorial Sums. 


S. 

«x i 

1 

1016 

10-16 i 

1200 

22-16 

13*90 

36-06 j 

15-91 

61-97 1 

17-93 

69-90 ; 

20-07 

89-97 i 

22-71 

112-68 

26-97 

138-65 

29-00 I 

i 167-66 

32-53 

1 200-18 

36-07 

1 236-25 

37-89 

1 274-14 

39-95 

1 314-09 

i 

314-09 

! 1723-86 


5 , 

! 

10-16 

1 

10*16 

32-32 

i 42-48 

68-38 

110-86 

120-35 

231-21 

190-25 

421-46 

280-22 

! 701-68 

392-90 

! 1094-58 

531-55 

i ' 1626-13 

699-20 

1 2325-33 

899-38 

1 3224-71 

1135-63 

4360-34 

1409-77 

6770-11 

1723-86 

7493-97 

7493-97 

• 

i 


In Table 22.3 the column headed /So gives the value of Y, The next column, headed 
8u gives the sums of the values in the first column proceeding from the top ; and so for 
the columns headed S^ and /S,. 

Now construct the quantities 


O, = - ,S, = = 24-160,769 

n 13 

2! „ 2(1723-86) 

^ - - = 18-943,616 


» (n + 1) 


182 


. S. _ _ 16 470,264 

n (n + 1) (» + 2) 2730 


the general formula being 


Then obtain the quantities 


„ _ 0 + 1 )!^, 

^ n(n + 1) ... {n + j)' 


Oq = «o == 24-160,769 

oi = Oo — Ui = 5-217,253 

Oj = a* — 30i + 2o, = 0-270,749, 


the general formula being 


«•=«.- + ( y-l) . (p)(y + l)(j> + 2 ) _ 

» * (1 !)* 2 ^ ^ (2 !)* 3 * 


. (22.79) 


. (22.80) 
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Finally put 


b,=ao = 24160, 769 




6 


» — 1 


a, = 


_ 6 (6-217,253) 


12 


= 2-608,626 


6.= 

the general formula being 


30 

1) in - 2) 


30 ( 0 . 270 , 74 ») _ 

^ 1 oo * ’ 


132 


b = • /no QlX 

^ (p !)2 (n - 1) . . . (n -p) 

Then the 6’s are the coefficients of the orthogonal polynomials in the regression equation. 
The values we have found check with those of the previous example and the reader may 
care to work out ft, 64 by the same method. 

This process is due to R. A. Fisher and avoids the direct calculation of the values of 
the orthogonal polynomials. Its validity may be established by using equations (22.75) 
and (22.73), which give 

b (2p!)(2 p + l)! 

^ EPl lp\yn(n^ - 1) , , , (n^ - p^) 

( 2 p +i) ! _ y ( ' o’+i) 1) 

ip !)2 (n-1) . . . (n -p) j (j !j2 (p-j) ! (j + i) (n^p-^l) ! n . . . {n+p) 

The first part of the expression explains the coefficients in (22.81), the second part those 
in (22.80). The third part gives rise to (22.79) when it is remembered that the sums 8 
are expressible as sums of factorials (cf. 3.10, vol. I, p. 58), but the summation takes place 
from the top of the column. 


Example 22.9 

As a rule it is unnecessary to evaluate the polynomial at all the points for which data 
are given ; but if the values are desired for comparison with observation they may be 
obtained by summatory processes from the differences. 

The terminal differences themselves are obtainable simply from the quantities of 
the previous example. For a polynomial of the first degree we have 


AY = -- 


6 


For that of the second degree, 

y = 


n — I 
Y = Uq -}- 3a^. 

60 


ai 


AY = 


a^ 


(«! + 5a2) 


For the third degree, 


y == 

A^Y = 


(n — 1) (n — 2) 
_ 6 

Y = CTq “ 1“ Scstj — 5a2« 
- 840 


(n — 1) (n — 2) (n — 3) 
60 


ox 


jy = - 


(n — 1) (n — 2) 
6 


(a^ + la^) 


(ai + 5ai + 14ai) 


n — 1 

y = a^ -)- 3a| + 5a2 + 7a3. 


. (22.82) 


. (22.83) 


. (22.84) 
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The formulae for higher degrees are constructed on analogous lines, the multiplying 
factors for successive differences being given by 

/ 1) (i? + 2) • • . (2p + 1) 

' ' - . . ■ (n-p) 

and the coefficients of the a’s by 


Y 

1 

3 

5 

7 

9 

11 

AY 


1 

5 

14 

30 

65 

A* Y 



1 

7 

27 

77 

d» Y 




1 

9 

44 

d« Y 





1 

11 

d» Y 






1 


We leave the proof of these results to the reader. 

For instance, for the data considered in the two previous examples we found, for the 
parabola of the second degree, 

r = 24-160,8 + 2-0O8,6X + 0 061, 633 (Z» - 14) 

= 24-160,769 ; ai = 6-217,263 ; = 0-270,749. 

Hence, from (22.83), 

J* r = , Ua = 0-123,068 

(n — 1) (» — 2) ^ 

d F (ai + 6aa) = — 3-285,499 

n — 1 

y = a; + 3ai + 6a; == 41-166,273. 

We then build up the poljmomial values as shown in Table 22.4. The second difference 
0-123,068 is shown at the foot of column (2). Being a constant, it could have been written 

TABLE 22.4 


Calculation of Polynomial Values from Differences, 


(1) 

(2) 

(3) 

(4) 

(6) 

(6) 

Number of 

Second 

First 

Polynomial 

Observed 

Difference 

Term. 

Difference. 

Difference. 

Value. 

Value. 

(6H4) 

1 


- 1-808,68 

9-863 

10-16 

0-297 

2 


- 1-931,76 

11-796 

12-00 

0-206 

3 


- 2-064,82 

13-849 

13-90 

0-061 

4 


- 2-177,88 

16-027 

16-91 

- 0-117 

5 


- 2-300,96 

18-328 

17-93 

- 0-398 

6 


- 2-424,02 

20-762 

20-07 

- 0-682 

7 


~ 2-647,09 

23-299 

22-71 

- 0-689 

8 


- 2-670,16 

26-969 

26-97 

0-001 

9 


- 2-793,23 

28-763 

29-00 

0-237 

10 

1 

- 2-916,29 

31-679 

32-63 

0-861 

11 


- 3-039,36 

34-718 

36-07 

1-362 

12 


- 3-162,43 

37-881 

37-89 

0-009 

13 

0-123,068 

- 3-286,499 

41-166,27 

39-96 

- 1-216 


all the way up, but, to do so is a waste of time (and in practice, of course, we should not 
devote a separate column to it). The first difference is shown at the foot of column (3), 
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and the figures above it constructed by adding the second difference at each stage. The 
polynomial values themselves are compiled by adding the first differences to the value 
at the foot of the column, 41*166,27. 

We have also shown the observed values and the difference between polynomial and 
observed values. The sum of squares of the latter is 6*204, agreeing within the margin 
of rounding-up error with the value for the sum of squares of residuals found in 
Example 22.7. 

As an exercise the reader should work out the polynomial values for the third- and 
fourth-order polynomials and compare the sum of squares of residuals with the values of 
Example 22.7. 

Multiple Curvilinear Regression 

22.26. We considered the linear regression of one variate on a number of others 
in Chapters 14 and 15. There now remains the extension of our results to the 
curvilinear case. 

The extension is very easy to carry out when we remember that in multiple linear 
regression there is no restriction on the degree of dependence among the “ independent ” 
variates. In particular, some of them may be functionally related, and more particulairly 
still, one variate may be a power of another. It is thus clear that the process of fitting 
curved regression lines can be regarded as formally equivalent to that of fitting linear 
regressions. For instance, the fitting of 

y =s -j- (l\ Xx X 2 ”4” Us ^3 4" U4 2^4 4- Us -^5 

is equivalent to 

Y = 4“ Ui JTi 4" Uj + tts Zi 4“ U4 4* U5 Z\y 

the latter being a particular case of the former where Xs is the square of Xi (and their 
covariation accordingly complete) and similar relations exist between X3, X4 and 

The case of curvilinear regression for a single variate, which has occupied the fore- 
going part of the chapter, could then have been treated by the methods of Chapter 16. 
We have discussed it afresh only because it is more easily dealt with by direct methods. 

22.27. In multiple regression analysis it sometimes happens that, having worked out 
a regression equation, we wish either to take account of a new factor or to remove one 
which appears redundant. To avoid the necessity of solving a new set of determinantal 
equations the following device is useful : — 

Consider the case of three independent variates measured from their mean 

Y == bx Xx + 63 X* 4 - 63 X3 (22.85) 

In accordance with our general method the constants b are given bv 

bx S (X?) + 63 X (Xx x^) + 6, X (Xx X,) = X (Xx y) Y 

bx X (Xx x^) + 6, X (xl) +b,Z (X3 X,) = X (X3 1/) y . . (22.86) 

bx X (xx Xs) 4" 6a X (X 2 X 2 ) 4“ 63 X (ajj) = X (x^ y) 

Suppose now we replace the functions X (xy) on the right by 1, 0, 0 and obtain the solutions 
bx = Ch, 62 = Ci2> 63 = C18 ; and similarly for replacement by 0, 1, 0 and 0, 0, 1, 
the solutions being written 

bx = Cii, C12, Cisl 

62 = Ci2> ^22* C23 >...... ( 22 . 87 ) 

63 = Ci3, C23, C33J 
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Then the solution of (21.86) is 

6, = c„ E (*i y) + c„ E (Xt y) + c„ E («, y) 'j 

= Cit ^ (*1 y) + Cft .Z" (a;* y) + Cn E (aj* y) / . . • (22.88) 

6* = Cjj Z* (asi y) + Cu E {Xt y) + ^ (afs y) J 

as is immediately evident on substitution. The values of the c’s are those we have denoted 

earlier in the chapter by determinantal forms, e.g. Cjk = 


22.28. Now suppose that we wish to discard the variate x,. From (22.86), 
1, 0, 0 written on the right, we find 


witn 


where (jk) stands for E {Xj x^), and 



1 ! 

m 

(13) 

1 

= — 

A \ 

(f2) 

(23) 

0 


ZJ 

(13) 

(33) 

0 

.), and 

• 





(11) 

(12) 

(13) 

A = 


(12) 

(22) 

(23) 



(13) 

(23) 

(33) 


(22.89) 


(22.90) 


1 I 

There are similar expressions for the other c’s. If the values of the constants when a:» 
is removed are cjj, C 22 we shall hwe 


— — Ti 


(12) 

1 / 

c' _ 1 1 (11) 1 

(22) 

q ' ! ’ 

“ A’ 1 (12) 0 


etc. 


where 

Now we have 




(11) 

( 12 ) 


(12) 

(22) 



(11) 

(12) 

1 


(11) 

(12) 

0 


(12) 

(22) 

0 


(12) 

(22) 

1 

Cl3 ^23 

(13) 

(23) 

0 


(13) 

(23) 

0 

C83 


(11) 

(12) 

0 




A 

(12) 

(22) 

0 





(13) 

(23) 

1' 




(12) 

(22) 

(11) 

(12) 

(13) 

(23) 

(13) 

(23) 

AA 



Thus 

Cis 


^is ^28 _ ^12 ^88 ^18 ^ 23 

C33 C33 


_ 1 (12)d- 

1 Jd' 

’ ^12* 


Jd' 


(22.91) 

(22.92) 


(12) 

(23) 

(11) 

(12) 


(12) 

(22) 

(11) 

(12) 

(13) 

(33) 

(12) 

(22) 


(13) 

(23) 

(13) 

(23) 


(22.93) 
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Similarly 




^13 

^33 


^2S 


*^33 


. (22.94) 


(22.95) 


This gives us the new c’s in terms of the old. Denoting similarly the new 6’s %y primes^ 
we have 


*1 - K = (c„ - c'J £ (x^ y) + (c,g - cj E (x, y) + c„ £ (x, y) 

= — { C|3 £ (Xi y) + Cu c.s £ (ajj y) + Cjs Cj* £ (xj y ) } 

Caa 

= 

^33 

Hence we have 


- h, 
= 62 


,6a^ 


Cas 
£23 
Caa ■ 


expressing the new constants in terms of the old and the known constants c. 
Finally, the contribution to the sum of squares due to the variate is 

61 £ (*1 y) + bi£ (Xi y) -f 6, £ {x» y) —b\£ (x, y) - b\ £ (a:^ y) 


. (22.96) 


= 6, £ (xt y) + 6, £ (aij y) 6, £ (x, y) 

C35 C33 

_ Vi 

C3J 


(22.97) 


22.29. Generally, if there are p independent variates the equations for the 6’s are 

fei Z (xf) + 6a Z(xiX 2 ) + . . . + b.,, E (xi x^) == E (y x^) 

b\ E {pCx Xp) 4“ 6a ^ (Xa Xp) + . . . + 6^, ^ i^'p) ~ ^ (y ^p)' 

If Xp is omitted the equations become (p — 1) in number in variables b[ , . , b'p_i. Sub- 
tracting from these the first {p — 1) of the above equations we find (p — 1) equatioiis, 
typified by 


(6; -6,) E {x^ Xj)+(b'^-b^) E (x^xj) + . . . + (bp^i-bp_i) E {Xp^^ x^)-bp E {x^ Xp) = 0 

(22.98) 

But these equations are the same as those for the coefficients . . . Cpp with (6^ — 6j) 
in place of Ci^, etc., and — bp in place of Hence 

^ip 


or 


bp ^pp 

b\ — 6, = — 

■ ®»p 


. (22.99) 
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Similarly it will be found that 




^pp 


V _ y. ^iP 

^la ^la — 


^PP J 


with simil&r equations for the other c’s. 


( 22 . 100 ) 


22.30. Somewhat similar results apply when a variate is added, 
refer to new coefficients when is added, we have, as above — 




b\-h, 


^11 ^11 — - j — 


= 


^qq 


^qq 


If primes again 


. ( 22 . 101 ) 


In order to use these equations to adjust the constants we require and 6^. 

By writing down the equations satisfied by . . . Ci^ and subtracting the correspond- 
ing equations in c'u , . . we get p equations such as 

(Cii — ^ii) ^ ^j) + • • • + {^ip Cjp) E (Xj E (Xj Xg). 


These are the same as the equations in 6i . . . 6^, with — c^g E {Xj Xg) instead of E {Xg y) 
on the right, and hence 



^-1 


Thus, using (22.101), 

- yCjaZXiX, ( 22 . 102 ) 


The last of the equations satisfied by Cgg is 

ClgE {XgXl) -f- . . . -f* CpgE {XgXp) “f” CggE{Xg) = 1. 

Substituting for c[g, etc., in terms of Cgg, we get 


^gg i ^ (^g) — ^ ^q) ^ (^k ^g)l — 1. • • . (22.103) 

> ^ 

This gives Cgg, and are derivable from (22.102). The other constants then 

result from (22.101). 

Cochran (1938a), to whom this proof is due, says that the elimination of two variates 
is best carried out in two stages of one each ; that where one variate is eliminated the 
method is quicker than re-solving the regression equations, except where there are only 
two independent variates in the first instance ; and that if two variates are being eliminated 
the method is quicker if the original number of independent variates is six or more. For 
the addition of variates the method is in all cases more expeditious than re-solving the 
regression equations. 
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Excmj)le 22.10 (Cochran, 1938a) 

In a study of the effect of weather factors on the number of noctuid moths per night 
caught in a light-trap, regressions were worked out on Xi (minimum night temperature), 
Xt (the maximum temperature of the previous day), X, (the average speed of the wind 
during the night), and X4 (the amount of rain during the night). The dependent variate 
was log (1 -f »), where n was the number of moths. 

It was subsequently decided to investigate the effect of cloudiness, measured on a 
conventional scale as the percentage of starlight obscured by clouds in a night sky camera. 
This is the new variate Xt. 

The quantities c^jj. for the first four variates were : — 

Xt Xt Xt Xt 

Xt -I- 0- 106,423,56 - 0 041 ,946,20 - 0 096,067,09 - 0 018,490,96 

Xt ... +0086,038,69 +0033,172,71 +0012,903,68 

X, ... ... +0-672,662,01 +0-008,116,62 

Xt ... ... ... + 0-062,276,32 

and the sums E (zj Xt) were 

E (Xt Xt) = — 4-867, E {xt Xt) = + 0-206, E {Xt Xs) = — 0-6446, 

E (Xt Xt) = - 6-42, E (xl) = 7-87. 

We then find from (22.103) 

4 = -h 0-210,133,14, 

and from (22.102) 

£« = + 0-.369, 198,24 ^ - 0-133,872,86 = - 0-118,533,74 

^56 ^65 ^55 

^ = + 0-249,298,91, 

®55 

SO that the new c’s are given by (22.101) as 



Xt 

Xt 

Xt 


Xt 


Xt 


0 - 134 , 066,26 

- 0 - 062 , 332,16 

- 0 - 106 , 263,03 

+ 

0 - 000 , 849,84 

+ 

0 - 077 , 680,79 

Xt 

• • ■ 

+ 0 - 089 , 804,68 

+ 0 - 036 , 607,20 

+ 

0 - 005 , 890,62 

— 

0 - 028 , 131,12 

Xt 

• • • 

• • • 

+ 0 - 576 , 604,43 

+ 

0 - 001 , 907,12 

— 

0 - 024 , 907,87 

Xt 

• • . 

■ • • 

• • • 

+ 

0 - 076 , 335,08 

+ 

0 - 052 , 386,96 

Xt 

. . . 

• • . 

. . • 


• • • 

+ 

0 - 210 , 133,14 


The original regression coefficients were 

61 = + 0-198,140,7 6, = + 0-038,628,4 6, = - 0-508,649,2, 

64 = + 0-031,848,2. 

5 

We now find 65 = 

= - 0-227,149,6, 

and from (22.101) we then have , 

6; = + 0-114,277,6 = + 0-068,937,6 63 = - 0-481,724,3, 

6; = _ 0-024,779,9. 

As usual we have retained more figures than are necessary, in order to avoid cumulating 
errors and to facilitate the detection of computational slips. 
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22.31. The constants c found in the forgoing method have a further use : th^ 
give the standard errors of the regression coefficients and provide some of the functions 
required in more exact tests based on the ^distribution. If, measuring y about the mew, 
we have 

Y = biXi + btXt . -\-bp Xp, 

then there are p equations of the kind : 

^ (»i y) — biSx\-\-btX (xi +bpS (x, Xp), 

and thus, recalling the definition of the c’s, we have 


6i = CixE(Xiy) + CxtE (Xty) . . . + c^pZipCpy). 


Thus, for fixed values of the x’«, 

var 6i = var 




' =c„var2/> .... 

and so for the other &’s. 

For large samples var y may be taken to be the estimated variance 


(22.104) 


1 


iX(y-y)\ 


n — p — 1 

If the sample is small and it is desired to make a more accurate test, then we have, 
by an extension of 22.21, that 

« = - .... (22.106) 
V X{y — y)* ■y/Cjf 

is distributed in “ Student’s ” form with v = n — p —I degrees of freedom. 


22.32. As a final comment we may emphasise that regression equations are only 
polynomials fitted to the means of arrays, and consequently that if the scatter about 
those means is substantial they are not very reliable as estimators (though they may be 
better than other methods). The comment would hardly be necessary were it not for a 
tendency to use the equations somewhat uncritically for purposes of prediction. The 
point assumes even greater importance when attempts are made to estimate the dependent 
variate for values of the independent variates outside the range on which the regressions 
are based ; or again, if the observations are distributed over time so that the population 
may be changing while the sample is being drawn. The technique of regression analysis 
is undoubtedly useful in many fields, but — as with many other statistical techniques — 
the careful investigator will apply it with a certain amount of self-discipline. 


NOTES AND REFERENCES 

The theory of curvilinear regression was studied by Karl Pearson (1905). Orthogonal 
polynomials had been considered, and the essential problems solved, by Tchebycheff as 
far back as 1857, but their use in statistics was not fully appreciated until about sixty years 
later. Pearson gave in 1921 the general formulae for fitting curved regression lines up to 
the fourth order. Neyman (1926) pointed out the elegance of the determinantal approach. 

]^m about 1920 onwards there may be discerned two main lines of development. 
The Sca n di na vian school, led by Wicksell, has developed the analytical theory of r^pesi^n 
-■-see Wicksell (19176, 1933, 19346) and a useful memoir by W. Andersson (1932). The 
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second line, followed by Fisher, Aitken and others, has been concerned with the fitting of 
regression curves to arithmetical data and exact significance tests — see Fisher’s papers of 
19216, 19226, 19246, 1926a, a paper by Allan (1930), and three papers by Aitken (1933a, 
6, c). The literature on orthogonal polynomials is now very large. 

For some illustrative material, see K. Pearson (1905), Andersson (1932), and Pretorius 
(1930). See also references to Chapters 14 and 15. 


EXERCISES 

22.1. Show that the regression of y on the variance of x (the scedastic curve) is 
given by 




where 


(Wicksell, 19346.) 

22.2. Show that if the regression of y on the mean of x is linear, then from (22.11) 


is a linear function of ^ (ti) and j- ^ (^i). Hence that 

CLt\ 

Kji K20 = Kii 


(Wicksell, 19346.) 


22.3. Show that if the marginal distribution of a bivariate distribution is of the 
Gram-Charlier Typo A : 

/ = a (a:) { 1 + a, //, + a. + . . . } 
the regression of y on a; is 

■30 TO 

Y _ j=^o 3 ' 

1 +^ajHj(X) 

(Wicksell, 19176.) 


22.4. Transforming the orthogonal polynomials of (22.74) to a new variate 

Yb 1 

i — X 5—, note that Pj, — is a numerical multiple of Pp_2, say XPp_g. Show 

that 


and deduce the recurrence relation. 


Pp = fPp-i “ 


{p - 1)* {n» - (P - 1)^ } p 
4 (2p - 1) (2p - 3) ** 


(Allan, 1930. The relation is due to Tchebycheff.) 
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22.5. A legreBcdon line 

Y =* Oo 4* ^ “t* "t" ®t 2 l* “f" O 4 JC* 

is fitted to normal data and the number of observations N is large. If r is the correlatibn 

„'a 

between the variates and e = — (the moments referring to the x-vaziate), show that 


var 

Uo 


- 86 ® + 6 *) (1 - 

■r*) 

var 



- 186* + 46®) (1 

-r*) 

var 

0, 


36*) (1 - r*) 


var 

0, 


-r*) 


var 

a« 





(Andersson, 1932.) 


22.6. In the notation of 22.31 show that 

cov (61 6 *) = C|» var y 

and hence show how to test the difference of two coefficients in a regression equation. 

22.7. Show how to derive a test of the significance of the difference of corresponding 
regression coefficients in two equations derived from independent samples, based on the 
result of 21.26. 



' CHAPTER 23 


THE ANALYSIS OF VARIANCE— (1) 


23.1. At various points in this book we have encountered in different guises the 
result that^he sum of squares of a set of observations about their mean can be represented 
as the sum of two ii^dependent sums of squares, each of which provides an estimate of 
the parent variance ; and that their ratio provides a test of homogeneity, at least when 
the parent is normal. We now proceed to study in more detail a method of statistical 
analysis with considerable generality which springs from this result. ) In view of the com* 
plexity of the general case we shall begin by considering simpler oases under somewhat 
restrictive conditions and shall extend our results stage by stage. 

{^ne-tmy Claasificatim 

23.2. Suppose we have a set of variate-values divided into p families : 


Xii 

0:21 .. . 



X22 • ^ • 


« 1 P 

• • 

• • 

• • • 


Denoting by * the meaq of the whole set and by Xj the mean of the values in the jth family, 
we have the identity , 

^ {% - «)* = ^ {Xii -X,+Xf -“?)* 
hi hi 

= X + 2^ (*; - *)*. • • (23.1) 

<» i i, 7 

since the cross-product term 2^ (x^j — Xj) (xj — x) vanishes. We may also write this as 

(*« - *)* = - *)*» • • (23.2) 


where is the number of members in the jib family. 

It will also be convenient, from the point of view of a later generalisation, to write 
the mean of the jth family as Xj and that of the whole as x^^, the periods in the subscripts 
showing which factor is being averaged. We have then the alternative form 


(*« - *..)* = 2^ (*« - ».#)* + 2^ (*j - ® .)* . . (23.3) 

i.J i,} i 


23.3. The problem we shall discuss in connection with families of values of this type 
takes some such form as the following : )the members of each family are randomly chosen 
from some parent population corresponding to that family. The populations themselves 
are, as a rule, defined by some prior system of classification given among the data of the 
problem,ie.g. they might be different varieties of wheat, the a;’s being the yields of the 
varieties grown under similar conditions, or they might be defined by income levels and 
the a;’s the expenditure on food of a sample chosen from the different income groups. We 
now ask : is there any evidence that the factor measured by x varies skniifioantiy firom 
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family to family ? Alternatively, can the data be regarded as homogeneous, i.e. as emana- 
ting from populations which are identical so far as concerns the faptor measured by « ?) 
Further, when the question of significance is decided, how can we estimate the variation 
of X in families or groups of families, and how can we estimate the nutgnitndA of any 
differences which exist ? 


23.4. ^We will assume, until further notice, that within each family the variation 
is normal with variance v, and that v is the same for each family. In later sections we 
shall endeavour to remove these rather restrictive conditions. Aon our present hjrpothesis 
the populations corresponding to the different families can differ, if at all, only in their 
means, and our first question is whether the sample values afford any evidence of such 
differences. 

Let us take as our h 3 q>othesi 8 that the parent populations have a common m. 

Then we recall the following facts : — ' 

(1) The sum -27(Xy— x,.)* is distributed in the Type III form of with 


N — 1 = Z' (ttj) — 1 degrees of freedom, that is to say as the sum of squares of N — 1 
i 

independent normal variates with zero mean and unit variance. 

(2) In any given family Xj is distributed normally with unit variance about 

mean w, and is independent of the sum (x^j — Xj)^ which is itself distributed &s 


with — 1 degrees of freedom. 

Since on our hypothesis the observations may be regarded as a single sample from 
the same population, it follows that 



{Xi^ — a;..)® is distributed as with N — 


1 d.f. 


1 2 ^ (*« - 


Zirti -l)=N -p d.f. 


(23.4) 


-*..)* » » p - 1 d.f. J 

The only statement requiring any proof is the last. It may be proved directly (see Exercise 
23.1), but we shall deduce it as the corollary of a general theorem due to R. A. Fisher which 
will often be required in this chapter. 


23.5. Suppose we have q variates Xj . . . x^ which are independently and normally 
distributed with unit variance about the same mean, which we may assume to .be 
zero. Put 


fr — ^ K» 
*=•1 


r = 1 . . . 3. 


If we choose the coefficients A so that 


= 1 
=0 


r = 




then dach f is distributed normally with unit variance independcmtly of the 


. (23.6) 

. i (23.4) 
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are q* coefficients X, and the equations (23.6) impose Jg (g + 1) conditions on them, so that 
the A’s can alwa3rs be found in a multiplicity of ways, in etf^t they correspond to the 
rotation of orthogonal co-ordinate axes in a g-dimensional space. 

Now suppose that we have h linear functions of the x's, . . . C* (1^ < ?) whose 
coefficients obey the orthogonality relations (23.6). These h variates are then distributed 
independently, normally and with unit variance. 

It is now possible to find q — h further variates C/,+i • • • which are or thogon al 
am ong them selves and to . . . Ca- Geometrically this is evident from tEe'pcrasibilities' 
of rotations in the g-way space. Algebraically it follows from the consideration that if 
qh of the A’s in (23.6) are known, q{q —h) are unknown, and the number of conditions, 
they must obey is - _ ' 

k (« + !)- k i (« - A) (« + A + 1). ^ - 

SO that values of the unknowns can be found in at least one way if 

i (q + h + l) <q ' 

or h + I <q. 

Now suppose We express a sum of squares of q normal variates with unit variance, 
say A, as the sum of two quantities B and C ; and suppose that B is distributed as the 
sum of squares of h independent normal variates with unit variance which are linear 
functions of the variates entering into A. Then we can find g — A such variates inde- 
pendent of the first A, and C must be their sum of squares. Further, the distributions 
of B and G are independent. By an extension of the same argument, if 

A^A^+A^ + ... +Ak (23.7) 

A is distributed as with v degrees of freedom. Ax with Vi, . . , ^4^^! with ; and 
ifthe variates entering into A 1 . . . are mutually independent and are linear functions 
of those entering into A, then Aj^ is distributed as with vj^ degrees of freedom, where 

r = Vi + Vi + . . . -f . . . . (23.8) 

and Aj^ is independent of Ai, . , . Af^^^- 

23.6. As an extension and kind of converse of this theorem we have the result, due 
to Cochran, that if are distributed as x^ with Vi ... degrees of freedom, 

and their sum A is distributed as x^ with v = 2* {vf) degrees, then are inde- 

pendent. We will prove this for the case A = 2, the more general result following in a 
similar way. 

If the characteristic function of Ax and Ao is 0 (^i, we have, by hypothesis, 

^ (**’ ®) "" (T- 2i<i)*^ 

0 (0, «,) = 

^ ~ (1 — 

Hence = (t, 0) ^ (0, t) = , 

thus (t, 0) and <f> (0, 0 divisyill by a factor in (1 — 2it)~^ and no other 

A.8.— TOL. n. N 
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factor in t because of the symmetry of ^ ^ 2 )- These factors are identified by ^ {tu 0) 

and h) as (1 — and (1 — and hence 

(^i> ^t) = 4^ (^u 4> (^> ^i)> 

or Ai and Ag are independent. 

23.7. Let us now return to the statements in (23.4). The sum i 27 (a?^^ — a?, J* is 

distributed as v = iV — 1. The sum ^27 (ar<^ — is so distributed with 

Vi ^ N — p. Further, the quantities may be transformed to JV -- p independent 

normal variates which are linear functions of the variates entering into the first sum. It 

follows from 23.5 that because of the identity (23.3) the third sum ^ 27 Wy (x^^ — a:,.)® is 

distributed as x^ with Vg = (N — 1) — (N — p) = p — 1 degrees of freedom, and that 
independently of the second sum. 

Thus we may exhibit our break-up of the total sum in the following form : — 


TABLE 23.1 


Form of Analysis of Variance for One-way Classification. 


Sum of Squares. 

^ - 

d.f. 

i 

' Quotient. 

: 


Of family means about the mean of thel 
whole / 

Znjix.f - ».,)* 


: — i-r 27 My {X.) - 

p - 1 j 


Of individuals in families about the! 
respective ffiunily mean . . . . J 

X* 

i,i /V 

N -p 

\N-pL - 

■ *.i)* 

Of individuals about the mean of thel 
whole J 

1 

^ (*<# - *..)* 

1 

N -1 

1 

J . 

*..)* 


We qote that the sums of squares and the degrees of freedom in the first two rows sum to 
those in the third row (though the quantities in the quotient column are not additive). 
This is the origin of the expression “ analysis of variance,” though, to be accurate, it is the 
sum of squares of the total which is analysed. 

To avoid cumbrous phrases we refer to the sum of squares of family means about 
the mean of the whole as the sum of squares between families,” and to that of individuals 
^about the respective family-means (for the time being) as “ residual.” We shall also speak 
b£iMal sum of squares and total mean with the obvious significance, and denote degrees 
of^^eedom by the initial letters “d.f.”* 

23.8. Since the mean value of x^ with v degrees of freedom is v, the quotients in 


* The need has been felt for a word to denote ** sum of sqxiares about the mean **. Professor 
Pitman has suggested the word “ squariance though he seems to feel that this leaves something to 
be desired. In my own notes 1 use the word “ deviance ” but have not ventured to mtrodube it into 
the text. 
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(23.1) are all unbiassed estimators of v, the parent variance. Only the first two, however, 
are independent. We recall that the ratio 


2 = i log 


N 


- 1 E (Xif - x,)Y 


2 


. (23.9) 


is distributed in Fisher’s form, which is independent of the variance i\ This distribution 
accordingly provides a convenient test of significance in the normal case. 


Example 23,1 

Let us consider the application of the foregoing theory to a simple example which 
has been chosen to reduce the arithmetic to a small amount. The following shows the 
lives in hours of four batches of electric lamps : — 

Batch 1: 1600, 1010, 1650, 1680, 1700, 1720, 1800. 

Batch 2 : 1580, 1640, 1640, 1700, 1750. 

Batch 3 : 1460, 1550, 1600, 1620, 1640, 1660, 1740, 1820. 

Batch 4 : 1510, 1520, 1530, 1570, 1600, 1680. 


We know that the batches were made from four different specimens of wire, but were other- 
wise made under identical conditions. (This, of course, over-simplifies the problem as it 
is encountered in practice, but will serve for purposes of illustration.) The question i s, 
do the bathes differ amon g t hemselves in l ength of li fe ? If so, we suspect that the quality 
of wife is varying materially, and if the lamps are to be standardised as far as possible the 
quahty of wire must be made more uniform from batch to batch before manufacture is 
undertaken. The numbers in this example arc small, but not much smaller than would 
be desirable in practice, owing to the expense and time involved in testing a lamp by running 
it until it bums out. 

The sums of x and x^ for the four batches will be found to be — 


Number in Sample. | (x) 27 (a?®) 


Batch 1 


99 


99 


2 

3 

4 


5 
8 

6 


11,760 19 , 786,400 

8,310 13 , 828,100 

13,090 21 , 503,700 

9,410 * 14 , 778,700 


TOTAIiS . 



42,570 


69 , 895,900 


Thus for the mean life of lamp in the four batches we have 11,760/7 = 1680; 
8310/6 = 1662 ; 13,090/8 = 1636-25 ; 9410/6 = 1668-33. These certainly differ, but is 
the variation such as cannot have arisen by mere sampling fluctuations ? 

We find 

a;.. = 42,670/26 = 1637-3077. 


E(x,j-xJ»=Exl-Nx?, 

= 69,896,9p0 - 69,700,189 


= 196,711. 


Thus 
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We also have 


^ % (* j - ® .)* = ^ ®j) 

= 44,360. 


The analysis then takes the form — 


Sum of Squares. 


d.f. 

Quotient. 

Between batches I 

44,360 

3 

14,787 

Residual ' 

! 

151,351 

22 

6,880 

Totals j 

195,711 

25 

i 

7,828 


We have 

2 i = 0*383 

^ ^ 6880 

Vi = 3, Va = 22. 

The 6-per-cent, point for these degrees of freedom is seen from the tables to be 0*5574. 
The observed value is therefore not significant, and we conclude that, so far as this test is 
concerned, there is nothing^to throw doubt on the homogeneity of the group. 

Having decided, provisionally at least, to accept the hypothesis that the data are 
homogeneous, we may ask, what is the best estimate of the parent variance ? Our analysis 
has given three different estimates, viz. 14,787, 6880 and 7838. It seems natural to use 
the last, which depends on the greatest number of degrees of freedom. 

With this value we find for the variance of the mean of samples of n, 



7828 

n 


88*48 
y/n * 


The greatest difference of means observed is that between the first and fourth batch, 
1680 — 1668*33 = 111*67. The standard error of this difference is 

88*48 V (I + i) = 49*2. 

The observed difference is rather more than twice the standard error, but we cannot con- 
clude that it is significant on that account. In fact, we have picked out the greatest differ- 
ence for examination from the six possible comparisons of pairs, and the dist|||^ution of 
the greatest difference must have* a larger standard error than that of a diffe^iiHtoe chosen 
at random, which is what we have found. Nevertheless the fact that even the greatest 
difference is only slightly in excess of twice the standard error affords some general evidence 
in support of the hypothesis of homogeneity. 

We may also note that if a more accurate test of the difference of two means is required 
the ^test may be invoked ; but here also we must remember that we are testing the greatest 
; of a set of differences. Where there are only two families concerned, the analysis of variance 
I reduces to the ^test for the difference of sample means when variances of the parents are 
assumed equal. 


23.9. Suppose now that in the ca^ie of one classification we have applied a test by 
means of the analysis of variance and have found that the hypothesis of honu^eni^ty is 
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unacceptable, or, in plain English, that the parents do differ. Let u» then consider the 
alternative that the populations are still normal and that they differ in their means but 
not in their variances. 

At first sight this may seem a highly artificial assumption to make, for if the popula- 
tions differ in their means it is not unlikely that they may differ in other respects. This 
is undoubtedly so, but if there is serious possibility of difference in variances their homo- 
geneity may be discussed separately by means of tests we shall consider in Chapter 26. 
Apart from this, there often arise in practice situations in which approximate equality of 
variance is plausible on prior grounds. For instance, we may be testing the effect of 
manuring on cereal yields, and it is reasonable to suppose that if the manure exerts any 
effect at all it will increase all plants of the same variety to about the same extent — ^that 
it will, in fact, displace the location of the distribution of yields without affecting 
its dispersion. 


23.10. The question we have now to consider is whether we can ma^ke an estimate 
of the common variance of the populations. A little thought will show that we can. The ^ 
reasoning which led to the conclusion that the residual sum of squares is distributed as i 
vx^ with N — p degrees of freedom remains unchanged, so that the residual quotient in j 
Table 23.1 continues to provide an estimator of v. The other two no longer do so. Con- 
sider, in fact, the sum of squares between families, and let the mean of the jth family be 
nij. Then we have 


i 


”1 




E Znj{x^j - f } * 

i 

— E S — iri j — ^ + E Uj 


(23.10) 


Here is the mean yZnjmj and hence X j — tri j has the mean x^^ Thus 

Znj{xj — ni j — distributed as vx^ with ^ ~ 1 degrees of freedom and 


EEn^ (xj - = (p -- l)v + Zuj {nij - m^y. . . (23.11) 

2 

Not unless — that is, all i)opul^ons Jiaye th^jsanie niean— dqe§L^ expression 

on the right reduce to (p — 1) v, and hence the quotient between families give an unbiassed 
estimator of v. In other cases it is greater. 

Similarly, 

^ Z ® ^ ® ^ - »».•)* 

={n'~1)v + . . (23.12) 

i 

The expectation of the difference of the two terms considered in (23.11) and (23.12) con- 
firms that the residual sum of squares provides an estimator of {N — p) v. 


23.il* A comparison of the formulae we have already reached and those of section 
14.31 will show that the study of intra-class correlation is very cloi^ely related to the analysis 
of variance. It is an interesting exercise to derive the 2 ;-test directly from the sampling 
distribution of intra-class r given in equation (14.110) (vol. I, p. 362) and vice-versa. 


TvHhway ClassifiedMon 

23.12. Wie proceed to the case when the variate-values belong not to one of a single 
set of famifi66 but to two, say A and B. In the first instance we shall consider the situation 
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when there is only a single value in the jth class of A and the kth class of B, 
may then be set out in the tabular form : 


Class B 



Bi 1 

1 


B, 

• • • 

Bq 

Totals 

■4i 

'i 

*11 1 

*12 

*1. 

. 

Xlq 

qxi. 


1 

*«i 1 

*22 

*28 

. 

X2q 

9*2. 


^31 i 

j 

*32 

*33 

, 



X’^q 

9*3. 

Ap 

Xp\ 1 

Xpl 

• 

' *1?3 

. 

Xpq 

9*p. 

Totals 

1 

px,*l 

1 

1 px.3 

. 

PX.q 

pqx.. 


Our sample 


. (23.13) 


This is not a contingency table. The numbers are variate-values, not frequencies. 
As usual, Xj^ signifies the mean of values in the. class Aj and x^j^ the mean of values in the 
class a?,, being the mean of the whole. 

We have the algebraic identity 

^ - »..)* = 2^ («jfc - - «.fc + ~ - *..)* 

},k i.k 

= y] (*/* - + *..)* + 2^ - *..)* + 21 ® 

i. k J. k j, k 

= 21 ~ ~ (23.14) 

li ^ ^ * 

the cross-prfiduct terms vanishing on sumfmation in the usual way. 


23.13. We are interested in the variation of the a;’s according to class membership. 
Let us take as our hypothesis that the pq values are homogeneous, that is to say that they 
all emanate from (normal) populations with the same mean m and variance v. In such 
a case class-membership exerts no influence on variate-values, and the observed differences 
are pure sampling effects. 

The expression on the left in (23.14) is then distributed as vx^ with pg — 1 degrees 
of freedom. The mean x^^ is distributed normally with variance v/q and thus B q (a?^^ — 

is distributed as vx^ with p — 1 d.f. Similarly, Ep{x ^ — x )* is so distributed with 

k 

g •— 1 d.f. Finally the remaining term on the right is distributed as vx'^ with (p — 1) (g — 1) 
d.f. ; for each term is normal with variance since 

pq 

— Xj^ — X^k + ^ ^ ^ ^ 

\ i p Pi/ I 

— ^ ^mk i 4 * ~ ^ I ^jy 

m \p pq/ PiTm 





TWO-WAY CLASSIFICATION 


183 


80 that the sum of squares of coefficients on the right is 

_ (P - 1) (« - 1) 


(P - 1) (« - 1) 




pq 


. (23.16) 


Thus, since there are p + q — \ linear relations connecting the pq quantities 


+ *..> 

their sum of squares is distributed as vx^ with p® — (p + ? — 1) = (p — 1) (9 — 1) degrees 
of fireedom, which checks against the mean value of the individual square given by (23.16). 
We may thus analyse the variance in the following way 


TABLE 23.2 

Form of Analysis of Variance for Two-way Classification with One Member in each Subclass 


Sums of Squares. 


d.f. 


jp - 1 

i 

Between B-classes p £ {xjc — g — 1 

k 

Besidual . . . ^ (xfjc — Xj, — x,ji -f a?..)* ■ (i> — 1) (? — 1) 


Between .d -classes qE(xj, — a?..)® 


Totals . . — a?..)* 


pq - 1 




Quotient. 


-X..V 

P \ j 

g - 1 fc 
1 

(P -!)((?- 1) 

^ (xjk - xj, - x,k + x„y 


LThe sums of squares and degrees of freedom (but not the quotients) are additive as 
before. It follows from the theorem of 23.6 that the three constituent sums are inde- 
pendent. Each quotient provides an unbiassed estimator of vfj 

23.14. Our use of these results proceeds by an easy generalisation of the method 
exemplified in Example 23.1. We take as our hypothesis the supposition that all samples 
are from normal populations with identical mean and variance. Comparison of the esti- 
mates in the quotient column then provides a test of significance. \JL£ the hypothesis is 
rejected we may examine the alternative that means are different but variances identical 
throughout, in which case we shall find that the residual still provides an estimate of the 
variance, provided that an important additional assumption is made.) 


Example 2S.2 

The following data (Daniels, Supp. 1938,* 5, 89) show the weight in grams 

of 96-yard lengths of wool thread from 100 “ ends ” being spun on four bobbins, 25 ends 
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to the bobbin. We are interested in two factors, the variation between bobbins and the 
variation in the 26 ends on the same bobbin, according to their position. 

TABLE 23.3 

Weight in Orams of 100 Q5-yard Lengths of Wool Thread spun on Four Bobbins. 


End Number. 

1 

Bobbin 

2 

Number. 

3 

4 

Totals. 

1 

7-50 

7 23 

7-60 

7-53 

29-76 

2 

7-62 

7-81 

7-77 

8-05 

31-15 

3 

7-70 

7*94 

7-83 

8-16 

31-63 

4 

7*93 

7*94 

7-96 

7-76 

31-69 

5 

7*78 

7-89 

8-02 

7-86 

31-54 

6 

7-73 

8-23 

7-99 

8-14 

32-09 

7 

807 

8-27 

8-25 

8-26 

32-85 

8 

801 

8-54 

8-24 

8-54 

33-33 

9 

8-22 

8-24 

8-37 

8-10 

32-93 

10 

8-24 

8-35 

8-43 

8-16 

33-17 

11 

817 

8-29 

8-46 

8-38 

33-30 

12 

809 1 

i 8-54 ' 

8 33 

8-47 

33-43 

13 

811 

8-45 ! 

8-27 

8-38 

3321 

14 

7-96 

8-43 

8-24 

8-60 

33-23 

15 

8*09 

8-47 i 

1 8-12 

8-45 

33-13 

16 

804 

8-33 

8-14 

8-43 

32-94 

17 

7*78 

8-47 

8-19 

8-57 

33-01 

18 

811 

8-63 

1 8-36 

8-38 

33-48 

19 

817 

j 8-31 

8-31 

8-16 

32-95 

20 

8-12 

i 8-31 

i 8-47 

8-41 

33 31 

21 

813 

i 8-10 

* 8-19 

1 8-27 

32-69 

22 

801 

1 8-01 

8-37 

7-96 

32-35 

23 

817 

I 7-92 

; 8-27 

8-08 

32-44 

24 

805 

i 8-27 

! 8-07 

8-16 

32-55 

25 

7-91 

i 7-92 

j 8-28 

8-52 

32-63 

Totals 

199-61 

204-89 

1 

j 204-43 

I 

205-76 

814-69 


It simplifies the arithmetic if we take a working mean at 8-00. 
squares about this mean is then found to be 

Z{x^,,Y = 9-3829, 

and we have also 

= 14-69. 

Hence 


The total sum of 


- a:..)* = 9-3829 - (0-1469) (14-69) 
= 7-224,939. 

The means of the fom bobbins are 

7-9844, 8-1966, 8-1772, 8-2304. 


With the same working mean we find for the sum of squares 

■S’ (*.k)* = 0-122,986,72 ; 
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and hence 

p2:(x^k - == 25 (0 122,986,72) - (0*1469) (14*69) 

= 0*916,707. 

The means of the four ends of corresponding position on the four bobbins can, of 
course, be found from the totals in the last column of the table, but it is simpler to find 
then divide by We find 

^ - (01469) (14-69) 

ID 

= 4*637,814. 

The continual appearance of the factor (0*1469) (14*69) ~ Nx]^ is to be noted. The 
quantity is best computed once for all at the outset. 

The residual sum of squares is then obtainable by subtraction, and we have the 
following analysis ; — 


TABLE 23.4 

Analysis of Variance for the Data of Table 23,3. 


1 Sums of Squares. 

Between bobbins .... 

! 

1 

0*916,707 

d.f. 

3 

Quotient. 

0-3066 

Between ends 

4*637,814 

24 

0*1932 

Residual ! 

1*670,418 

1 72 

0*0232 

Totals , 

7*224,939 

1 

99 

1 

0*0730 


The variation between bobbins and that between ends are both significant — ^the ratio 
of the corresponding quotients to the residual quotient is so big in each case as hardly to 
require the z-test. We are led to suspect that the variation between bobbins, small as it 
is, caimot be a chance effect, and it looks as if bobbin number 1 is not getting its fair share 
of thread. Similarly, the weight of thread seems to be dependent on whereabouts the 
thread is spun on the bobbins, and an inspection of the original data suggests a systematic 
variation as we proceed along the bobbin from end number 1 to end number 25, with a 
possible maximum in the middle. If the manufacturing process is to be standardised as 
much as possible, wo should have to examine the reasons for the shortage of weight on 
the first bobbin and for this systematic effect of position on the bobbin. 

23.15. Suppose now that, as in the example just given, the hypothesis of homo- 
geneity is rejected. What interpretation can we put on the residual quotient i Let us 
assume that each observation comes from a normal population with variance v, but that 
the parent mean of the subclass Aj is these quantities varying from one subclass 
to another. Is the residual quotient an unbiassed estimator of v ? In general the answer 
is ‘‘ no but there is an important class of case in which it is affirmative. 

Let my, be the mean of the q values of my^ in the class ^4y, m,^. that of the p values 
in and m,. the mean of the whole set of m’s. Then we may write 

^ * • • • • • (23.16) 

(23.17) 
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Then 

^ ^ — ^.jfc+« .)*=^ ^ (myjfc— 1%. . +fijfe— f/. — f.Jfc+f..)* 

=JS Z — m,jfe+m..)*+-Ef 27 (I/*— fy. — • (23.18) 

the product term vanishing as usual. The second term on the right is equal to 
(p — 1) (g — 1) v, for the |’s are distributed with variance v about zero mean, so that the 
term in question is the residual sum of squares in a p x g two-way classification of a homo- 
geneous sample and hence has the stated expectation. Thus we have 

E 27 (ajyfc - a:y. - - my. - m.jj. + m..)* + (p - 1) (j - 1) v. (23.19) 

The residual quotient will then provide an unbiassed estimator of v i^nd only i£ 

myjfe — my* — m.ji; + m.. =*0. . . . . (23.20) 


23.16. Now suppose that is made up of three parts which are additive, viz. 

(1) the effect of the class Ap say ay ; 

(2) the effect of the class £4., say bj ^ ; and 

(3) a residual ^ik which is normal and has zero mean. 

This kind of hypothesis will recur froquently. It amounts to an assumption that there 
is in Xjjg an element ay which affects alike all members of the class Aj but varies from one 
A-class to another ; an element bj^ which similarly affects alike all members of but varies 
from jB-class to R-class ; and a third component representing random variation which, 
apart from the sampling factor, is the same for all subclasses Ay We then have 


and 

\ 


==%+&* + Cik 

^jk = + 6 * 

=«/+*. I , 

»».* = « + ** 

*».. = a + 6., 


(23.21) 

(23.22) 


where, as usual, the subscript periods in the a’s and 6’s denote averaging. Thus 

i^ik - »V. - »».*! + = O') +A - (% + 6.) - (a. + K) + ®. + ft. 

= 0 , / 

so that (23.1^) is satisfied and the residual quotient is an unbiassed estimator of the 
variance v. > 

Under the same conditions it will be found that 


qE E {Xj, - ».)* = (p - 1) v 4- g ^ (nty. - »»..)* 

= (p - 1) ® +qE\at — a)* . . . . (23.23) 

i 

pEZ(x,j,-xJ* ^{q-l)v +pE{b^-by . . . .(23.24) 

k 

E Z (Xj^. - a:..)* = (pg - 1) v + ^ (o, - a. + 6* - 6 )• 

fc » 

= (pg-l)« + g2:(o, -rf)*+pi:(ftfc- 6)» ). (23.26) 

i fc * 

23.17. We have supposed that the component C had a zero mean, but of course if 
: .all these components had the same mean, the constant commonitb them could be abswbed 
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'into the functions and Our hypothesis is thus a little more general than it appears. 
In certain practical cases it is a plausible hypothesis to make. For instance, in Example 
23.2 it is reasonable to suppose that the effect of a particular bobbin is the same for all 
ends, and the effect of situation the same for all bobbins. If there is any serious doubt 
on the point wc have to collect further data and consider interactions in the manner 
described later (see 23.22). 

It may, however, be noted that if the variation of the is comparatively small 
the appearance of the term containing them in (23.19) does hot materially vitiate an estimate 
of V from the residual quotient. In any case that estimate will be greater than the unbiassed 
estimate, so that our inferences about significant differences of mean values will, properly 
interpreted, be on the safe side. 

23.18. Before going farther we may remark that the quantity we have called the 
residual sum of squares and the associated quotient are often referred to as “ error ” or 
“ interaction ” terms. The former is likely to cause misunderstanding and is better avoided 
altogether, for, as we have seen, it provides a measure of sampling variance, and there- 
fore of experimental error, only in particular cases. The word “ interaction ” we shall 
define below ; it has been used in different senses by different writers, and when consulting 
original memoirs the reader should endeavour to ascertain the precise meaning which 
is being attached to it — if he can. In considering a given analysis it is as well to reflect 
on the precise nature of the items covered by su(?h expressions as “ residual “ remainder ”, 
** error ” and so forth. 


Three-way Classification 

23.19. Consider now the case when there are three classifications into A-, JB- and 
(7-classes. As before, we shall consider in the first place one member in each subclass 
Aj Bj. Cf. typified by We now have 

^ {Xjki - * ,.)* = ^ - *...)* ^ (*. *. - *..)* + ^ (*. .J - f...)® 

j, k,L 

+ ^ i^jk. ^ "t" 

+ Z (ar.fc, ■“ » + x^^y 

+ ^ (^jkt '"'jk. “ ~ “ *...)*> • (23.26) 

the summations extending over all members of the sample, pqr in number, so that we may 
replace expressions such as ^ —x^^y by qrZ{Xj^^ — a:...)®, etc. 

On the usual hypothesis of normality and homogeneity we find that the first three 
terms on the right of (23.26) are distributed as vx* with p — 1, q — 1 and r — 1 degrees 
of freedom. The second group is so distributed with (p — 1) (? — 1), (p — 1) (r — 1) and 
(? — 1 ) (^ ~ 1 ) degrees of freedom. The last is distributed with {p — 1) (j — 1) (r — 1) 
degrees of freedom. All but the last of these results follow from the two-way case, and 
the last may be established (.as in 23.13 or) by the consideration that for any fixed I the 
term has (p — 1) (? — 1) degrees of freedom and that there are (r — 1) independent Vs.^ 

We may then write the ai)alyBis in the form shown in Table 23.5. (For the present 
the expression “ interacti on AB " is to be regarded merely as a name given to a particular 
sum of squares. As before, the sums of squares and degrees of. freedom are additive, 
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and the seven items into which the total sum of squares is analysed are distributed 
independently.) 

TABLE 23.5 


Form of Analysis of Variance for T'hree-may Classification with One Member in each SvJbclcLSS. 



Sum of Squares. 

d.f. 


Quotient. 

Between ./I -classes 


p - 1 



Between B-classes 

.! S(x.k. -X...)* 

q - 1 


The quotient of 

Between C-classes 

S(x..i - X,..)* 

'/• — 1 


the sum of 

Interaction AB . 

• i ^ {*/*. - Xj„ - x.k. + *.,.)* 

(p - 1)(? - 1) 


squares by the 

Interaction BC . 

. i - x.k. — X..I + *...)» 

(? - !)(»• - 1) 


corresponding 

Interaction CA . 

. : S(xj.i - Xj.. - x.,i + *,..)* 

(r - 1) (p - 1) 

1 

d.f. 

Residual . 

• .2:{xfia - Xj.. — x.k. — X..I + Xjk. 

(p - 1) (g - 1) (r - 




\ + X.kl + Xj.l - X...)* 


i 


TOTAIiS . 

• 1 ^{Xjkl - *...)* 

1 1 

pqr — 1 

! 

' 





. . , 

1 



^ 23.20. If the hypothesis of homogeneity is rejected we may consider the alternative 

represented by 

.... (23.27) 

where as usual, is normal with zero mean. As in 23.16 it will be found that the residual 
term in Table 23.5 has expectation (p — 1) (g — 1) (r — 1) and hence continues to provide 
an unbiassed estimator of v. The quotients between classes are affected like those in 
equations (23.23) to (23.25) ; but the interaction terms also provide estimators of v with 
the appropriate degrees of freedom. For instance, 

+ a^...) = «:/ + + c. + Cjfc. - + c. + 'Cu.) 

— (a, + + f.fc.)+ (u. +6^ + ^. C...) 

= -Ci,. +f... (23.28) 

so that the expectation of the sum of squares of the aj-terms is that of the C-terms, which 
we know to be (p — 1) (g — • l)v. 

23.21. This brings up a new point arising for the first time in the three-way classi- 
fication. If (23.27) is true, the analysis of variance will provide four different estimators 
of the variance Vy namely the interactions ABy BC and CA and the residual. These are 
independent (for they depend only on the C’s, and the theory appropriate to the case of 
homogeneity continues to apply) and their ratios may be tested in the z-distribution. If 
these ratios are such as can have arisen from random sampling we may accept the hypothesis 
represented by (23.27) ; if not we must reject it.l'Jn short, the interaction quotients pro- 
vide a test of the hypothesis (23.27).^ In the two-way classification no such test is available. 

Iv/krcLCtions ^ 

23.22. On the hypothesis (23.27) the interaction quotie^to of type AB give unbiassed 
estimators of the variance v. If in any particular case these%uotients differ significantly 
among themselves or from any other independent estimator Vy we have to reject the 
hypothesis. Apart from the normality of the variation of Ci '^Moh is not for the* moment 
in question, this means that we cannot represent the data as the sum of separate effects 
due to 4'» and C-classes, together with, a residual f which is the same in foym for all* 
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subclasses. The effects of the classes are entangled — or, as we may say, they interact 
This is the origin of the term “ interaction ) 

Suppose, for instance, our data are crop-yields, and membership of the three classes 
corresponds to applications of three manures, nitrogen (A), potash (B) and phosphate (C). 
The hypothesis represented by (23.27) would then be equivalent to supposing that all three 
manures exerted an effect on yields, but that they did so independently. A given dressing 
of nitrogen would increase the yield by aj, whatever dressings of the other fertilisers were 
applied. But it might happen that the response in yield to aj varied according to how 
much of the others were present — potash might either stimulate the effect of nitrogen or 
inhibit it. If this were so, the fertilisers would interact and the hypothesis (23.27) would . 
break down. ^Significant departures from homogeneity in the interaction terms usually/ 
^ lead us to search for possible entanglements of this kind. | 

23.23. ^It must not be overlooked, however, that significant interactions, do not 
necessarily imply interaction in any real sense. They may arise from heterogeneity in 
the data. J To return to our example of crop-yields, suppose the yields were taken from 
a series of plots which differed materially in natural fertility. It might very well be found 
that the hypothesis (23.27) could not be justified even if the differences in yields due to 
the natural effect were partially absorbed into the coefficients a, 6 and c. If by chance 
the heavier dressings of fertilisers were applied to plots of greater fertility, the hypothesis 
might be shown as failing and “ significant ” interactions appear. Such points as this I 
require careful consideration in the interpretation of significance, and we shall illustrate I 
them in some examples below. 

23.24. ^Interactions of typo AB, involving two classes, are said to be of the first 
order. When considering the general w-way classification we shall see that there can 
appear interactions of second, third, fourth . . . order. In fact, the residual in Table 23..5 
is formally equivalent to an interaction of the second order, of type ABC, just as the first- 
order interaction is equivalent to the residual in the two-way analysis of Table 23.2. 

To complete the definitions, we may define the sum of squares between A -classes as 
an interaction of order zero. The seven constituent items in Table 23.5 would then 
correspond to the following : — 

' Interaction. d.f. 


Order zero 


Order 1 . 


P - 1 
q - 1 
r - 1 

ip - 1) iq - 1) 
iq - l)(r -- 1) 


Order 2 


This illustrates the general symmetry of the analysis and suggests obvious generalisa- 
tions. ^ 

n-way CUtaaificeUions 

23.25. For instance, with five classes A, B, C, D and E we may analyse the total 
sums of squares into 2* — 1 = 31 components. There will be ^ = .<5 interactions of 
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order zero ; ^ ^ ^ = 10 interactions of first order, type AB ; ^ ^ ^ = 10 interactions of 

second order, type ABO \ ^4^ = 6 interactions of third order, type ABCD ■, and one 

residual or interaction of fourth order, type ABODE. The interactions of zero, first and 
second order are of a type already familiar : — 

■ 2 ^ )* 

- 2 ^ (*#*... -X}.... +« )* 

^ (*#«.. - %... - ® w.. - + a:.*.., + a:,.,. - x )« . (23.29) 

The third-order interactions are typified by 

+ ^.kl.. + X.k.m. + + X ) 2 . (23.30) 

and the reader will be able to write down the residual for himself. 

As usual, the 31 terms all furnish independent estimators of the variance on the 
hypothesis of homogeneity, and if this is rejected we may consider the alternative 
represented by 

+ ^/ + + ^jkimn .... (23.31) 

The complete analysis in such cases may become yepL complex . but frequently it is sufficient 
to consider only sums of squares suggested for investigation by prior expectations./ 

Example 23.3 

The following data show the percentage water-content in a number of samples of 
a commercial product. Six samples were chosen ; each sample was tested by four different 
operators ; and each operator carried out the determination by three different methods. 
We have thus a 6 x 4 x 3 classification. 

TABLE 23.6 

Percenta>ge Water-Content of Six Samples determined by Four Operators using Three 

Methods. 


Samples. 

-- -■ 

1 

Tests. 

-- — 

— 

2 

Tests. 

Open 

itors. 

3 

Tests. 

— 

— 

4 

Tests. 

- 

1 

2 

3 

1 

2 

3 

1 

2 

3 

1 

2 

3 

1 

59 

61 

61 

57 

60 

58 

55 

58 

62 

54 

56 

59 

2 

67 

58 

60 

67 

1 58 

58 

61 

60 

57 

o 

56 

58 

3 

55 

67 

59 

55 

55 

56 

54 

62 

58 

63 

55 

56 

4 

60 

57 

58 

56 

67 

67 

54 

58 

56 

61 

59 

58 

6 

61 

61 

60 

59 

58 

59 

61 

57 

60 

62 

60 

60 

6 

63 

59 

60 

62 

63 

61 

64 

62 

5b 

69 

m 

61 
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We will first of all analyse the variance systematically with rather more arithmetical 
detail than is usually required, in order to illustrate the process. 

A great deal of work is saved if we take a mean at 60. The table then becomes — 


TABLE 23.7 










Operators. 








Samplos. 



1 

■ 



2 

■ 



3 

■ 



4 


Totals 


Tests. 

■ 


Tests. 

■ 


Tests. 

■ 


Tests. 



1 

2 

3 

Totals 

1 

. 

2 

.. . . 

3 

TOTAT.S 

1 

2 

3 

Totals 

1 

2 

3 

Totals 


1 

-1 

■ 


1 

-3 

0 

-2 

5 

-6 

-2 

2 

-6 

-6 

-4 

-1 

-11 

-20 

2 

-3 

-2 

0 


-3 

-2 

-2 

7 

1 

0 

~3 

-2 

o ' 

-4 

-2 


-20 

3 


-3 


-9 

-6 

-6 


14 

-6 

~8 

-2 

-16 

B 

-6 

-6 

; -17 

-66 

4 

0 

-3 

_2 

-5 

-4 

-3 

-3 


-6 

-2 

-5 

-13 

B 

-1 

-2 

-2 

-30 

5 

1 

1 

0 

2 

-1 

-2 

-1 

4 


B 



2 

0 

0 

2 

-2 

6 

3 

-1 

0 

2 

2 

3 

1 

6 

H 

2 



-1 

0 

1 

0 

13 

Totals 

-6 

-7 

-2 

1 

i 

-14 

-0 

-11 

-34 

F 

^ 

-13 

-9 

-33 

-11 

-14 

-9 

' -34 

-116 


We have shown the totals of the tests for each operator, of the tests for all operators, and 
of samples for each test. 

We now form three two-way tables from this by adding the values of one of the 
variates, e.g. — 


TABLE 23.8 

Operators. 



1 

2 

3 i 

i 

4 

Totals. 

1 

1 

- 5 

- 6 

- 11 

- 20 

2 

- 5 

- 7 

~ 2 

- 6 

- 20 

3 

- 9 

~ 14 

- 16 

- 17 

- 66 

4 

- 6 

- 10 

. 

- 13 ! 

- 2 

- 30 

5 

2 

- 4 

- 2 

2 

- 2 

6 

2 

6 

5 

0 

13 

Totals 

- 14 

- 34 

- 33 

i 

1 

i 09 
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Samples. 


TABLE 23.9 
Tests. 



1 

2 

3 

Totals. 

i 

1 

1 

i 

- 15 

- 5 

0 

- 20 

2 

~ 5 

- 8 

- 7 

- 20 

3 

- 23 

- 21 

- 12 

- 66 

4 

- 9 

9 

- 12 

- 30 

5 

3 

- 4 

- 1 

-- 2 

6 

8 

4 

1 

13 

Totals 

- 41 

- 43 

- 31 

- 116 


TABLE 23.10 

Operators* 




1 

i 

2 ! 

3 

1 ^ 

Totals. : 


1 

- 6 

- 14 

-11 1 

~ 11 

- 41 

Tests. 

2 

- 7 

- 9 j 

- 13 

- 14 

- 43 

3 

- 2 

- 11 ! 

i 

- 9 

i - 9 

1 

- 31 

1 

1 

Totals 

- 14 

1 

34 

- 33 

1 

1 

1 - 34 

- 116 


As we have inserted the totals of various kinds in Table 23.7 these subsidiary tables 
can be picked out at once ; but in general, totals are not available in the original (and for 
four-way classifications it is difficult to find a form of tabular presentation which will permit 
of their insertion) so that the tables have to be separately compiled. In practice I find it 
convenient to do so in any case to avoid picking out the wrong figures in the original table. 

Pursuing the condensation process, we should now derive three one-way tables from 
Tables 23.8 to 23.10, but in fact the row and column totals already give us what is required 
(and incidentally provide a check on the arithmetic). 

Now we proceed to find the various sums of squares. For the total of all observations 
we find — 115, and for the sum of squares of observations 653. Thus 

*... = — = - 1-597,222 

2^**^ - 116x .. = 183-680,666 

27 - *,.)* -= 27 (a:,*,)* - 

= 653 - 183-680,666 
469-319,444 

with 6x4x3 — 1= 71 degrees of freedom. 


. (23.32) 
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For the interactions of order zero we require the sums of type 

where summation takes place over the N values. It is, however, unnecessary to work out 
the means Consider, for example, the sum of squares between samples. From the 

totals of Table 22.8 or Table 22.9 we find {j denoting samples) — 


2:(12a?^..J2 ^ 20)2 + 20)2 + . . . + 13* 

= 6009, 


where the summation is over six values only. Thus, for summation over the 72 values- 

^ 5009 = 417-416,667. 

Hence 

= 417-416,667 - 183-680,556 
= 233-736,111 

with 6 — 1=5 d.f. 

Similarly (k denoting operators) we find — 

- 183-680,556 


16-152,778 


with 3 d.f. ; and (/ denoting tests)- 


- ;r ..)2 ^ _ 183-680,556 

= 3-444,444 


(23.33) 


(23.34) 


(23.35) 


with two degrees of freedom. 

Now we require first-order interactions. We have (summation being over the N 
values) — 


^ i^ik. - ® fc. + * == ^ io^ik. - -I- ^ ~ ^..y 

+ 2 ^..y {^Jk. *...) *...) 

— 2 2* (x^fc — a? ) {x jc — X ) 

= 2 (Xj,, - x^y -2 (Xj[, x/y - 2 (a:.,, - a:...)2 (23.36) 

and thus the first-order interaction term is ascertainable from 2 (Xjk,V quantities which 
have already been computed. 

From the body of Table 23.8 (remembering that summation relates to 72 values and 
hence that each value in the table is counted 3 times) we find 

- 3i {1* -f- (- 6)* + . . .} = 


= 499-666,667. 

The interaction term is then 


499-666,667 - 183-680,556 — 233-736,111 - 16-152,778 = 66-097,222 . (23.37) 
with (6 - 1) (4 - 1) = 15 d.f. 

Similarly in the body of Table 23.9 we find for the sum of squares 1915. Hence the 
interaction of samples and tests is 

- 183-680,666 - 233-736,111 - 3-444,444 = 67-888,889. . (23.38) 


A.S. — ^VOL. II. 


O 
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In the body of Table 23.10 the sum of squares is 1246. Hence the interaction of tests 
and operators is 

im _ 133,030 550 _ 10.162,778 - 3-444,444 = 4-222,222. . (23.39) 

6 

Finally, the residual is given by the difference of the total sum of squares and the 
interactions already found, namely by 

469-319,444 - 233-736,111 - 16-162,778 - 3-444,444 - 66-097,222 - 67-8881889 

- 4-222,222 = 87-777,778 . . . (23.40) 

with (6 — 1) (4 — 1) (3 — 1) = 30 degrees of freedom. 

We can now make up the table of variance analysis as follows ; — 

TABLE 23.11 


Analysis of Variance of Dala of Table 23.7. 


Sum of Squan^s. 


d.f. 

Quotient. 

Between samples (S) . . . ; 

233*736 

5 

46*747 

„ operators (0) . . . ' 

16*153 

3 

5-384 

„ tests (T) . . . . , 

3 444 

2 

1-722 

Interaction SO 

66*097 

16 

4*406 

„ OT i 

4*222 

6 

0*704 1 

ST 

57*889 

10 

5*789 j 

Residual 

87*778 

30 

2*926 1 

1 

1 

Totals 

469*319 

71 



We proceed to discuss the data in the light of this analysis. 

The most striking feature of the table is the size of the quotient between samples. 


The variance ratio here is 


46-747 


16-976, with a corresponding value of z equal to 1-38. 


For Vi = 6, V, = 30 the 0-1 -per-cent, point is 0-8664, and the ratio is highly significant. 

We remark in passing on a point, which will be taken up later. The ordinary z-test 
gives the probabilities that the ratio of two variances chosen at random does not exceed 
a given value. But in this case we have deliberately picked out the lai^est quotient for 
one of our estimates. If z had fallen at the 6-per-cent, level we could not have argued that 
the odds were 19 to 1 against the event. They are very much less, since we have deliber- 
ately chosen the largest value for comparison with the residual. However, in the present 
case our probability is so small that we can confidently assume the significance of z (see 
23.27 below). 

Our first inference, then, is that the whole sample is not homogeneous. There appear 
to be variations from sample to sample which are not assignable to differences between 
tests or operators, and if we wished to standardise our prbduct with greater accuracy we 
should be led to examine the manufacturing process. This conclusion is, however, subject 
to a point which we discuss in the next example. 

Having rejected the hypothesis of homogeneity we are now faced with the question 
whether the other quotients in Table 23.11 can be compared so as to assess the relative* 
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variability of the other factors. We must then take a new hypothesis, and we wdll suppose 
that the variable may be written 

+ Sjkh ..... (23.41) 


where is an unknown quantity expressing the accepted variation between samples. 
Unless there is something very peculiar about the tests or operators it is reasonable to 
suppose that the variation between samples can be isolated in this way. IWe will now 
suppose that the f’s, not the x% are distributed normally with common mean and variance v.) 

If the values given by (23.41) are substituted in the various constituent items of Table 
23.5, it will be found that except for the variation between samples all the other sums of 
squares assume the same form with f written instead of x. This, of course, follows from 
23.20 of which our present hypothesis is a particular case. On the hypothesis of (23.41) 
we are thus enabled to compare the quotients in the table in the usual way. The element 
of variation between samples has, so to speak, been abstracted from the discussion. 

We then turn to the sum of squares between operators in Table 23.11. The variance 

ratio is = 1-84. For — 3, — 30 this is not significant. Similarly, for the sum 

1*722 

of squares between tests we find a ratio of ---, again not significant. Provisionally we 

conclude that there is no evidence of variation between operators and tests, apart from 
pure sampling effects. 

Now we have to consider the interactions. For that of SO we have the variance ratio 


4*406 

o oor ~ 1*^1, which is not significant. We find the same for the interaction ST, For 
OT we have (taking the larger variance as the numerator) 


= i log^ 


2*926 

0*703 


0*71.3, 


r, — 30, Vj — 6. 


This value is just beyond the 5 per cent, point and, judged by itself, might have been regarded 
as significant ; but taken in conjunction with the others it may, perhaps, be accepted as 
a permissible sampling fluctuation. 

To sum up, therefore, the only evidence of deviation from homogeneity appears in the 
sample-differences, and we see no reason to reject the hypothesis represented by (23.41). 
Since all the other items in the analysis, apart from that between samples, are homo- 
geneous, we could condense the table into the form— 


Sum of Squares. 

d.f. 

1 

Between samples . . . . j 

233*736 

5 

Remainder i 

235-583 

66 

Totals ! 

i 

469*319 

1 

1 

71 

1 


Quotient, 

46-747 

3-569 


The reader may wonder why, in carrying out the tests of significance, we have through- 
out used the residual quotient as the denominator of the variance ratio, and not, for instance, 
one of the interactions. There are two reasons. First, the residual has more degrees of 
freedom, so that it is preferable notwithstanding that the 2 ;-test is valid for any number 
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of degrees of freedom. Second, the residual is not so likely to be affected by interactions 
which, though not emerging into significance, might nevertheless exist. But once we have 
established that an interaction is not significant, there is no reason why it should not be 
amalgamated with the residual, as in the table on page 195. 

Example 23 A 

There is appoint of great importance concerning the inference from analyses of variance, 
which we will illustrate by an imaginary example based on the data we have just con- 
sidered. Suppose our analysis of variance were of the following form : — 


Sum of Squares. 

d.f. 

Quotient. 

Between samples .... 

125 

5 

25 

Between operators .... 

60 

3 1 

20 

Interaction SO 

160 

15 1 

10 

Bemainder 

48 

48 

1 

1 

Totals* 

1 

383 

1 

71 : 

! 



We will suppose that the sums of squares between tests and the other first-order inter- 
actions are not significant, so that they can be amalgamated with the residual to give a 
remainder with 48 degrees of freedom as shown. 

On this evidence the sums of squares between samples and between tests are both 
significant, as also is the interaction 80, What inference can be drawn about the varia- 
bility of the product from one sample to another ? We know that the readings differ 
significantly ; but may not this difference itself be due to the demonstrated variation 
between operators, or does it really exist ? Is there in fact any variability in the water- 
content of the product, apart from the sampling effect in homogeneous variation ? 

The significance of the SO interaction means that we cannot now regard the effects 
of operator and sample as independent. We must consider the possibility of entanglement. 
This is not the only explanation—there may be some other specific cause of variation 
present which we have not thought of, and on which our present data throw no light. But 
in this case there is some prior possibility that samples and operators are “ entangled ” or 
interacting in the ordinary sense. An operator may be getting better results from his 
material when it has high water-content than in the reverse case ; or, knowing that the 
mean content is near 60 per cent, he may unconsciously (or even consciously) bring his 
determinations nearer to that figure and hence reduce their spread. 

In a case of this kind, and indeed in all statistical inquiries, it is important to have 
a clear idea of the question which is being asked and of the population to which it relates. 
We have had a number of samples and have tested them by four operators each using 
three tests. So far as we can see, the tests are equivalent but the operators are not. All 
the same, we are not very interested in the variation among operators (unless this is 
an experiment in psychology and not in chemistry). What we want to know is whether 
the w:ater-content varies in reality, that is to say as the average of a large number of 
determinations by different operators. Our particular four are themselves samples of 
a population of operators. 
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If we confine our attention to the four operators and suppose that each has a specific 
reaction to particular samples so that 

+ f jfc • • • • (23.42) 

where f is a normal random residual with variance v for all j, k, then in the usual 
way we find 

E Z = {p — 1) (q — 1) V Z • (23.43) 

But suppose we consider the matter from a different viewpoint. Regard mjig as itself 
chosen at random from a normal population of operators with variance v\ Then, taking 
expectations of this population in addition, we find from (23.43) 


E Z {xj^ -- x^, :r.,. + - {p 1) {q - 1) (v }- t;'). . . (23.44) 

Thus the interaction term provides an unbiassed estimator of the variance v + v' of Xj/^. 
By “ unbiassed ” in this connection we mean that the average over all determinations and 
all operators will give the variance of x^^^ in the population of all determinations and all 
operators. 

Similarly we shall have, on the same interpretation, 


E Z {xj^ - x^y r= (p ^ 1) (v + v')l 
EZ{x,^^xJ^^{q~^l){v \^v^)l • 


. (23.45) 


and hence the ratio of either interaction of zero order to the first-order interaction may be 
tested for homogeneity. Our analysis then becomes — 


Sum of Squart^s. 


d.f. 

: Quotient, 

i Botwooii samples .... 

125 

i 

5 

25 

, Between operators .... 

60 

3 

20 

i Residual (jSfO) 

150 

15 

10 

Totals 

335 

23 



Neither ratio is now significant. For the sum of squares between samples we have 
a ratio of 2-5, ^ 6, — 15, which is below the 5 per cent, point. 

Thus we should conclude that, regarding the data as a member of possible samples from 
all possible operators, there is little or no evidence of real variation from sample to sample. 
This is quite consistent with the inference we drew at the beginning of the example as to 
the “ significance ” of the terms concerned, though at first sight it appears directly 
contradictory. In the first case we inferred that for these four operators there were signifi- 
cant differences in their determinations for the samples, so that sample-differences are 
“ real ” in the sense that they cannot be attributed solely to random variation in homo- 
geneous material. In the second case we enlarge the domain by considering operators as 
subject to error ’’ in the sense that one human being differs from another, and find that 
sample-differences can now be ascribed to variation in the population of operators. 

No further emphasis is needed on the care necessary for the proper interpretation of 
the results of an analysis of variance. The nature of the population which is being con- 
sidered should be brought explicitly to mind in every case ; and the reader should form 
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the habit of asking himself, whenever a result is found to be “ significant ” : significant 
of what ? 


Arithmetic of Variance Analysis 

23.26. Before considering further examples we will dispose of a few points arising 
from the calculation of the constituent sums of squares and the application of the 2 ;-test 
in determining the significance of variance-ratios. 

The calculation of sums of squares for an w-way classification can very conveniently 
be carried out by the use of a punched-card system when the data are numerous, and some 
remarkable computing feats have been performed by this technique. For ordinary labora- 
tory work with a machine, the process of Example 23.3 is possibly the best, though some 
modifications may be made to suit individual taste. 

The main work lies in computing the total sum of squares. This is done by finding 
the sum of squares of observations from the original data (with a convenient working 
mean) and the sum of observations obtained at the same time. The formula 

I =r.Sx^kl- 

^ ^ ^ ^jki • • • * (23,46) 

then gives the total sum required. The quantity is constantly needed and should 
be recorded. It is useful to preserve a few more decimal places than will ultimately be 
used in the final presentation of the analysis. 

The original data are then condensed into n (n — l)-way tables by summing over 
each class in turn. In Example 23.3 this was done so as to give three tables : Operators- 
Samples, Tests-Samples and Operators-Tests. The main body of these tables gives means 

of the type multiplied by a constant factor. A further condensation will give 

sets of means of type ; and so on, as far as is required. 

From the condensed tables we can then determine the sums of squares of means of 
various orders, and hence the interactions. The main pitfall lies in the way of the applica- 
tion of the correct multipliers and divisors -it has to be borne in mind that the summation 
takes place over all values of the sample. 

Suppose, for example, we have a four-way classification into classes with p, q, r and s 
numbers of members. The first condensation gives us four tables of which a typical one 
is X q X r, based on the sum of members. The next condensation gives us six two-way 
tables typified by p x q, based on the sum of rs members. The third gives us four one- 
way tables such as p, based on qrs members. Consider the variance between p-classes : — 

.... (23.47) 

In the condensed one-way table of p classes each term is to be counted qrs times, and 
thus, if S is the sum of squares in this table as it stands, 



S 




Thus, summing over all members, we find 


qrs^ 


whence (23.47) gives the zero-order interaction for p-classes. 


Similarly for g, r 


. (23.48) 
and s. 
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For the first-order interaction we have 

E -1- a:....)* 

- 2’ {Xj_ - x_y - ^ (*.*•.. - . • (23.49) 

The last two terms on the right have already been found. We require 

-x_f -Nxj_ . . . .(23.60) 

If /S' is the sum of squares of elements in the body of the two-wa!y table found by adding 
r- and .s*-items, we find 

(23.61) 

and so on. The general process will now be clear. 

Unfortunately there is no convenient independent check on the calculations. The 
various condensed tables are self-checking since their totals are the sum of all observations, 
but the sums of squares do not (iheck with anything. It is, of course, possible to evaluate 
each individual term in the residual and to check by summing squares, but this is too 
laborious for use except in the simjilc^st (^ases. 

Use. of the z-fest for Several Variavre- ratios 

23 . 27 . In the complete analysis of n classes there are 2'^ - 1 elements, and the 
number of variance ratios arising for test may be considerable. The s-test gives the proba- 
bility that a particular value chosen at random will be exceeded. If therefore we pick 
out the largest ratios for test, the (jhance that one of them is “ significant ” in the sense 
of exceeding the 10()/^-|)er-cent. point is a good deal greater than P, and we run into the 
danger of attributing significance to what may be a pure sampling effect. 

Suppose we make r different and independent tests of r values of ^. The chance that 
each does not exceed a fixed value (depending on the number of degrees of freedom) is 
1 P, where P is some assigned level of significance. Hence the chance that none of 
them exceeds its 'appropriate value is 

(1 Py ~ 1 — rP, approximately, . . . (23.52) 

provided that P and rP are small. For instance, if P = 0-01 and r = 7 the probability 
that no exceeds its appropriate significance value is 0-93, and thus there is a probability 
of 0*07 that at least one of them will do so. 

In practice the problem of numerous comparisons is more complicated because they 
are not independent. In such circumstances our judgment of significance has to incor- 
porate an element of the intuitive. However, if all the comparisons are based on the 
common residual quotient it is possible to find the probabilities that the largest of r values 
exceeds assigned values. The resulting expressions are complicated, even when all the 
sums of squares have the same degrees of freedom, but reference may be made to Hartley 
(1938) for approximations and to Cochran (1941) and Finney (1941a) for exact expressions. 
The conclusion reached by Finney is that if the degrees of freedom in the residual are 
sufficiently numerous the ratios may be treated as completely independent. 

23 . 28 . There is a particular case of the w-way classification which is worth special 
mention, namely, that for which each classification is a simple dichotomy, so that there 
are 2^^ subgroups. This case arises frequently when so-called “ factorial ” experiments 
are being conducted to determine the effect of a treatment which is either applied or with- 
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held. The analysis of variance remains the same in principle, but of course the arithmetic 
becomes a good deal simpler. 

Example 23.5 (F. Yates, Supp. J.B.S.S., 1935, 2, 181) 

An area of ground was sown with peas and divided into 24 plots in the manner shown 
in Table 23.12. The plots received, or did not receive, dressings of nitrogen {N), phosphate 
(P) and potash {K) in the manner shown, the yields in pounds being given in the table. 

TABLE 23.12 


Yields of Peas and Manurial Treatments on 24 Plots 



There is some purpose here in the alternation of treatments, but that need not concern us 
for the present. We have 24 observations in four classes, viz. blocks (3), nitrogen (2), 
phosphate (2) and potash (2), giving 3x2x2x2 = 24 records. 

Condensing the table by adding blocks we get the following : — 

No treatment N P K NP NK PK NPK Total 

164-3 191-3 163-0 166-0 173-8 164-0 151-5 163-1 1317-0 

Condensing according to the three treatments we have — 
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N 

not-jV 

Totals 

K 

327*1 

307-6 

634-6 

not-jfiC 

365-1 

317-3 

682-4 

Totals 

692-2 

624-8 

1317-0 


We omit the I'emaining calculations. The analysis in its final form is given in 
Table 23.13. 

TABLE 23.13 


Analysis of Variance of the Data of Table 23,12 


Sums of Squares. 

d.f. 

Quotient. 

Between blocks (B) . 



i 

j 

- i 

177-803 

2 

88-90 

„ N . . . 




189-282 

1 

189-28 

P . . . 



• 1 

8-402 

1 

8-40 

„ K . . . 



. ! 

95-202 

1 

95-20 

Interaction BN . 



1 

. ! 

94-255 

2 

47-13 

» BP . , 



. ! 

2-260 

2 

1-13 

„ BK . . 




23-685 

2 

11-84 

» NP , , 




21-281 

1 

21-28 

„ NK , . 



I 

. 1 

33-134 

1 i 

1 .33-13 

PK . . 



' 

0-481 

1 

0-48 

„ BNP 



. 'i 

25-302 

i 2 1 

! 12-65 

„ BNK . 




36-004 

1 2 

I 18-00 

„ BPK . 



, 1 

3-782 1 

2 

1 1-89 

„ NPK 




37003 

1 

i 37-00 

Residual (BNPK) . 



- j 

128-489 

2 

64-24 

i 

Totals . 


. 


876-365 

23 



We have carried out the analysis in full so as to illustrate the arithmetical process 
for a four-way classification, but we may note at once that it is unduly elaborate. There 
are only 24 observations in the data and we cannot expect them to provide all the answers 
to the questions which we could frame as to the significance of the various constituent 
items in the analysis. This is borne out by the 2 -test. The residual variance is 64*24 
with two degrees of freedom. For Vi — i, V 2 — 2 the variance ratio at the 1-per-cent, 
point is 98:49 and that for = 2, = 2 at the same point is 99-00. Only values greater 

than about 100 times 64-24 or less than 1 /100th of that value would thus be significant. 
Only the interaction PK falls outside this range, and even this, among so many, can hardly 
be regarded as significant. 

The inquiry is not, however, completely frustrated. Since the second-order inter- 
actions are not significant, we amalgamate them with the residual to give a remainder 
sum of squares of 230-580 with nine d.f. and a quotient of 25-62. It will now be found 
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that among the first-order interactions only two are significant, PK and BP being too 
small. Had they been too large we might have attributed some genuine significance to 
this result, but it is not very plausible to suppose that there is a “ real ” interaction between 
blocks and phosphate, or that phosphate and potash inhibit each other’s action. The 
differences from expectation are more probably due to individual soil variation from plot 
to plot. 

If we accept the first-order interactions as not significant, we may amalgamate them 
with the remainder to give the following : — 


Sum of Squares. 


Blocks . 

N . . . 

P . . . 

K . . . 

Remainder 


Totals 



d.f. 

Quotient. | 

j 

177-803 

2 

88-90 

189-282 

1 1 

189-28 

8-402 

1 

8-40 

95-202 j 

1 

96-20 ‘ 

405-676 

18 

22-54 

876-365 




Here the P-quotient is not significant, but the variance ratio for blocks, 3*99, is near the 
5-per-cent, point. The V-quotient will be found to be significant at the 1 -per-cent, point, 
the if-quotient near to the 5-per-cent, point. Our conclusion is that there is strong indica- 
tion that nitrogen influenced the yield, some indication that potash did so, and little indica- 
tion that phosphates did so ; and that there is ground for suspecting heterogeneity in the 
soil partly because of the difference between blocks and partly from some of the first-order 
interactions. 

In this case, of course, we knew already more or less what was to be expected of these 
data and are the readier to accept the conclusions on that account. Had we known nothing 
of the effect of fertilisers on leguminous crops our conclusions on such slender evidence 
must have been very tentative indeed, particularly if we wished to extend them to peas 
grown on other soils under different climatic conditions with different amounts of fertiliser. 


Example 23,6 (C. E. Gould and W. M. Hampton, Supp. J,R,S,S,, 1936, 3, 137) 

In the manufacture of optical glass there appear small bubbles known as “seed”, 
which constitute a defect. The glass is made in “ pots ” which take about a year to pre- 
pare, and are run continuously over long periods when once started. There are two pots 
to a furnace and materials are introduced into a pot from time to time which, after fusion, 
provide a “ run ” of glass. Each run provides several days’ work, one day’s work being 
known as a “ journey ”. At each journey quantities of glass are drawn from the pot and 
blown into “ cylinders ”, there being about 18 or 20 to the journey. For the purposes of 
the experiment three cylinders were chosen, the third, tenth and sixteenth, and pieces of 
regular size cut from them for examination as to frequency of seed. The first five journeys 
of each of five runs were sampled. 

We have here a four-way classification 2 (pots) x 5 (runs per pot) x 5 (journeys per 
run per pot) x 3 (cylinders per journey per run per pot). The actual dates of the runs 
were February 16th, May 23rd, June 12th, September Ist and December 6th, so that the 
manufacturing period covered about ten months. We shall assume that the glass was 
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of the same type throughout, although in actual fact it was different in one or two cases 
— but not sufficiently different to affect the analysis. 

The topic of main interest here is whether the frequency of seed varies significantly 
according to the four factors concerned. If so, the alteration of manufacturing conditions 
may improve the wastage due to seed ; but if not — and the variation is the kind of thing 
which can be accounted for as chance fluctuation in sampling from a homogeneous popula- 
tion — there is little hope of improvement except perhaps by a radical alteration in the 
process affecting all pots, runs and journeys alike. 


TABLE 23.14 

Frequency of “ Seed ” in Samples of Glass 






Pot 1. 


I 

Pot 2. 





Cyl. 1. 

' Cyl. 2. 1 

: 

1 

Cyl. 3. 

I 

C^l. 1. 

Cyl. 2. 

Cyl. 3. 


J 1 


47 

1 

56 

100 

52 

61 

88 ! 


2 


: 55 

89 

93 

49 

62 

97 1 

Kun 1^ 

3 


1 35 

57 

56 

34 

60 

72 ! 


4 


78 

67 

113 

47 

93 

118 


0 


33 

' 40 

128 

16 

29 

130 ; 


\J 1 


52 

66 

36 

65 

80 

40 


2 


21 

61 

49 

122 

97 

79 1 

Run 

3 


31 

39 

25 

45 

54 

72 


4 


43 

72 

52 

109 

120 

80 


5 


37 

51 

67 

67 

85 

63 

1 

1 


50 

61 

60 

75 

139 

130 j 


2 


33 

27 

1 49 

46 

58 

63 i 

Run 3^ 

3 


24 

39 

1 24 

15 

33 

39 i 


4 


18 

18 

1 

22 

16 

19 1 


5 


28 

42 

I 28 

27 

19 

22 1 


rJ 1 


24 

34 

! 43 

46 

66 

24 ; 


2 


i 24 

49 

i 42 

40 

117 

105 ! 

Run 4^ 

3 


21 

21 

1 51 

30 

28 

34 


4 


21 

i 69 

! 48 

36 

64 

53 1 


. 5 


76 

48 

1 42 

39 

60 

78 


[J 1 


j 31 

' 54 

40 

19 

93 

36 


2 


34 

24 

46 

16 

; 12 

2 

Run 5^ 

3 


120 

122 

120 

33 

58 

107 


4 


109 

119 

120 

25 

63 

90 


5 


1 69 

49 

j 60 

34 

i 43 

30 




1 


i 

. 

i 



Before plunging into the analysis of variance it is as well to look over the data to see 
whether they themselves suggest any lines of inquiry. We observe considerable varia- 
bility from journey to journey within the same run, J3 and J4 of run 6 being conspicuous 
in pot 1 ; and in run 1 the numbers of seed appear to increase from cylinder 1 to cylinder 3 
in a rather exceptional way. The runs themselves seem to differ materially. Prior con- 
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siderations also suggested an examination of the way in which frequency of seed varied 
between pots, since they were chosen so as to differ substantially in constitution. 

A complete analysis of variance of the data is as follows : — 


TABLE 23.15 


Analysis of Variance of the Data of Table 23.14. 


Sums of Squares. 

d.f. 

Quotient 

Between pots (P) .... 

898 

1 

898 

,, runs (R) .... 

14,059 

4 

3,515 

„ journeys (J) . . . 

4,355 

4 

1,089 

,, cylinders (C) 

10,631 

2 

5,316 

Interaction PR 

16,133 

4 

4,033 

PJ 

4,081 

4 

1,020 

„ PC 

687 

2 

293 

„ RJ 

45,934 

16 

2,871 

„ RG 

11,626 

8 

1,453 

„ JC 

2,540 

8 

317 

„ PRJ 

9,711 

16 

607 

„ RJG 

i 12,472 

32 

390 

„ JCP 

i 1,656 

8 

207 

„ CPR 

1 1,862 

8 

233 

Besidual (PRJC) .... 

! 8,110 

i 

32 

253 

Totals 

1 

j 144,655 

1 

149 

1 


The second-order interactions will be found non-significant, so we amalgamate with 
the residual, giving a sum of squares 33,811, d.f. 96, quotient 352. 

It then appears that of the first-order interactions PE, RJ and EG are significant and 
PJ may be so. There is beginning to appear evidence of heterogeneity, and that of a rather 
complicated kind. It seems that pots are interacting with runs, runs with journeys and 
runs with cylinders. 

Taking 352 as the quotient, we find that except for P the zero-order interactions arc 
significant. The five JB-means are 68-50, 62-67, 42-23, 47-77 and 59-27, so that the variation 
of runs is not a simple rise or fall, which could have been explained as a time-effect. The 
five J-means are 58-93, 55-37, 49-97, 64-83 and 51-33, again not a regular effect. The 
O-means are 44-46, 59-68 and 64-12, which are significantly difi*erent. Inspection of the 
table suggests that the first run is the source of the trouble. 

With data as heterogeneous as these it is rather difficult to set up a plausible hypothesis 
to test. The interactions of first order suggest that no simple additive effects of the four 
factors will explain observation, and if these terms are used as denominators in tests of 
variance ratios the variation between classes appears on the whole non-significant on the 
usual hypotheses. The analysis, then, suggests several subjects for inquiry as concerns 
the homogeneity of the data, but does not suggest any simple explanation of the observed 
figures. The reader may care to refer to the original paper for a more complete discussion 
of the subject. 
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23.29. Perhaps we may pause at this point to review progress. We have seen 
that for an r^-way classification of the special type wherein each subclass contains a single 
member, the sum of squares of all observations about their mean can be exhibited as the 
sum of a number of such sums. On the hypothesis of normality and homogeneity each 
constituent sum of squares, on division by its appropriate number of degrees of freedom, 
gives an estimator of the parent variance, and each is distributed as independently of 
the others. The hypothesis of homogeneity can then be tested in Fisher^s 2 -distribution, 
subject to the adoption of a conservative attitude where many tests are made on the same 
data. If the hypothesis is rejected we may replace it by a simple form in which the effects 
of the different classes are additive, provided that the interactions are not significant. 
The particular ratio chosen for a test depends on the hypothesis concerned, and it is import- 
ant to have a clear idea of the exact question to which an answer is sought, 

23.30. In the next chapter we shall consider the case when the numbers in different 
subclasses are not equal, discuss the additive hypothesis in more detail, examine the relation- 
ship of variance- and regression-analysis, and extend our results to the analysis of covariance. 
We conclude this chapter by an examination of the important question: what can be 
done with the analysis of variance when the variation is not normal ? 

Non-normal Data 

23.31. The analysis of a sum of squares into its constituent sums can, of course, be 
undertaken in all circumstances, but the various quotients may not continue to provide 
unbiassed estimators of the parent variance if the population is not-normal. What is 
equally serious, the constituent sums of squares may not be distributed independently. 
Thus, when parent normality cannot be assumed, the quotients in the analysis table are 
no longer equal within sampling limits and their ratio is distributed in unknown form ; and 
even if the form were known it would probably depend on parent parameters and hence 
fail to provide an exact test of significance. 

The problem has been considered in four ways : — 

Sampling experiments have been undertaken to see how far moderate deviation 
from normality affects the 2 -distribution ; 

yfi) Attempts have been made to find transformations of the variate to throw the 
parent distributions into forms with equal variances, at least approximately, 
before the analysis is applied ; 

^c) By introducing a randomising process into the data before they are collected, 
attempts have been made to preserve the 2 -distribution as a close approximation 
— this amounts to a change in the nature of the inference, as we shall see below ; 

J(d) Tests have been found which can be applied to ranked data irrespective of the 
parent form — this approach is a particular case of (c), but seems to merit special 
mention. 

We proceed to consider these four possibilities. 

23.32. The arithmetic entailed by a single analysis of variance, even in simple cases, 
implies that an extensive sampling inquiry into the distribution of z in non-normal popula- 
tions would be a very formidable undertaking. E. S. Pearson (19316) has studied in some 
detail the case of a one-way classification with unequal numbers, when the distribution 
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of z becomes equivalent to that of the correlation ratio ry*. Six populations were chosen, 
characterised by the following values : — 

Pi = 0, /?2 = 2-50 (symmetrical platykurtio) ; 

Pi == Pi = 4-1 (symmetrical leptokurtic) ; 

Pi = 0, Pi = 7-05 (symmetrical leptokurtic) ; 

Pi == 0-2, P 2 = 3-3 (skew, Type III) ; 

Pi = 0*49, p 2 = 3-72 (skew, Type III) ; 

= 0*99, P 2 = 3-83 (very skew, Type I, with abrupt start). 

The results suggested that for this range of /?, and P 2 the distribution of z is adequately 
represented by Fisher’s distribution, and that therefore the homogeneity test may be 
applied. The case when the variation changed from group to group was not considered. 
It was also concluded that “ it seems probable that the more elaborate forms of analysis 
of variance are also of fairly wide application ”. 

Some work by Eden and Yates (1933) is often referred to as experimental confirmation 
of the same kind, but in fact it was carried out with rather a different object, that of con- 
firming the s-test for data under randomisation (see below, 23.36). 

Variate Transformations 

23.33. Suppose | is a new variate | (x). Then approximately we shall have 

If now the parent variance of the oj-distribution is related in some known manner to the 
mean, say / (m) = v, we have 

As a further approximation, if x varies about m by small quantities we have 

varf = /(») (23.54) 

Now we wish f to have a constant variance, say A, and if this is so, 

U.. 

dx \l f (a;)’ 

“ 

Although this expression is arrived at by approximation we are entitled to hope that 
the variate f will have almost constant variance, and at any rate a more stable variance 
than X. 

For instance, if the original variation is thought to be of the Poisson type we have 
/ {x) = X, and from (23;55) are led to consider the transformation 

= V*. 


2 

var a:. .... . (23.53) 


. (23.56) 
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if we choose A to be Similarly, if the variation is of the binomial type with variance 
p(\ — p) we have 

^ I y/iv 0 - P)} 

= sin ^ y/x, (23.67) 

on suitable choice* of A. 

23.34. These transformations are designed to ‘‘ stabilise ” the variance. They do 
not necessarily bring the variate closer to normality, though in some cases they will do so 
— we have, for instance, seen that y/x^ tends to normality quicker than x^ (12.7). The 
following values (Bartlett I936d) illustrate the way in which the square-root transformation 
stabilises the variance of a Poisson distribution : — 


Moan m. 

Variance of Poisson j 

Variance of Poisson 

Variate \/x. j 

Variate \/{x H- I). 

0*0 

0000 ' 

0-000 

0-5 

0-310 

0-102 

10 

0-402 

0-160 

20 

0-390 

0-214 

30 

0-340 

0-232 

4-0 

0-306 

0-240 

00 

i 0-276 

0*245 

90 j 

1 0-263 

0-247 

120 

0-269 

0-248 

160 

0-266 

0-248 


I 


The term J in the third column was added by Bartlett on the analogy of a continuity 
correction. For m > 3 the variance is evidently quite stable. 

23.35. If now, having stabilised the variance, we carry out an analysis in the ordinary 
way, our residual sums of squares divided by the appropriate degrees of freedom will con- 
tinue to be unbiassed estimates of the common variance v, even if there are differences 
between the means of the classes. Instead of assuming as part of the hypothesis that the 
different classes are distributed with the same variance, we have transformed the variate 
so that this shall be so, at least to a close approximation. Relying further on the result 
that the transformed variates approximate to normality, or that if they do not the differ- 
ence will not seriously vitiate the z-test, we may apjJy that test to the transformed data 
in the usual way. 


Example 23.7 (Bartlett, 1936d) 

Table 23.16 shows the number of wheat seeds out of 60 which failed to germinate in 
four repetitions of an experiment with different treatments. 
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TABLE 23.16 


Germination of Wheat Seeds 


Number of 

Number of Treatment. 

Totals. 

Experiment. 

1 

2 

3 

4 

5 

6 

7 

1 

10 

a 

8 

9 

7 

6 

9 

60 

2 

8 

10 

3 

7 

9 

3 

11 

51 

3 

5 

- 

11 


8 

10 

7 

11 

54 

4 

1 

6 

i ^ \ 

13 

7 

10 

10 

51 

Totals 

24 

38 

! 

1 17 

1 

1 37 

' 1 

33 

26 

41 

216 


In point of fact, treatment 7 was a repetition of treatment 6, the others being different. 
The point of interest is whether the treatments exert any effect on germination. We shall 
not inquire into any differences between experiments (which appear to be negligible from 
the row totals) and shall accordingly consider this as a one-way classification into seven 
classes, four numbers to the class. 

The presumption is that in any given class the variation is of the binomial type. We 
might apply the sin“^ transformation, but will adopt instead an ad hoc square-root 
transformation obtained as follows : — 

We have 

V = np {V — p). 


Suppose now that p ^ p^ + d where d is small. Then 


V = n (po + 5 — Po — 

= n { (1 — 2po) {p — Po) +Po— pl} 
= np(\ — 2p,) + npS. 

If we now put 


f - + * + i) 


where k — 


npt 

1 — 2po 


and X is the observed frequency, then S will tend to have constant 


variance. 

In our example the total frequency is 216 out of 1400 seeds, so that we may take as 
an estimate of po the ratio 216/1400 ~ 0*15. The transformed variate then becomes 



np + i + 


50 (-0225) 1 
0*70 J 


= V(np -1- 2), approximately. 

X 
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On this basis the transformed variate-values are — 

TABLE 23.17 


Transformed Variates of Table 23.16 


Number of 
Experiment. 

1 


Number of Treatment. 



Tota^. 

2 

3 

4 

5 

6 

7 

1 

3-464 

3-606 

3-162 

3317 

3-000 

2-828 

3-317 

22-694 

2 

3162 

3-464 

2-236 

3-000 

3-317 

2-236 

3-606 

21-021 

3 

2*646 

3-606 

2-000 

3-162 

3-464 

3-000 

3-606 

21-484 

4 

! 

1-732 

1 

2-828 

2-449 

3-873 

3-000 

. 

3-464 

3-464 


Totals 

11004 

1 

' 13-504 

1 

9-847 

13-352 

1 

12-781 

11-528 

13-993 

86*009 


The analysis of variance is — 


1 ■ 

Sums of Squares. 

i 

d.f. 

Quotient. 

i 

j Between treatments .... 

3-486 

0 

0-581 

1 

4-316 

21 

0-206 

i 

j ■■■■ ■ ■■■ ■ ■■ i 

1 Totals 

1 ' 

7-802 

27 



The sum of squares is particularly easy to obtain, being the sum of the original variates 
plus twice the number of variate-values. 

The variance ratio, 2-8, is barely significant, being just beyond the 5-per-cent, point. 
There is little evidence that treatments are exerting any eiFect on germination, since a 
comparison of treatments 6 and 7 (which are the same) indicates that such significance ” 
as exists may be due to heterogeneity in the seed. 

Randomisation 

23.36. (ponsider a two-way classification of pq members, the observed value of the 
Jth .d -member of the ith B-class being ) Following the line already considered in 21 .48, 
we will consider the ^-distribution in the population of values obtained by permuting the 
members in any -class in all possible ways. There will thus be (q !)^ possible values of 
2 , all based on the observed values. We have already considered a case of this kind in 
dealing with the problem of m rankings (16.29) and we shall follow the same procedure 
in solving the more general problem. 

A.S.— VOL. II. p 
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Let the values be arrayed as 


Xii 

X%1 

Xig 

Xii 


®8a 

^pl 

Xp2 

• 



. (23.68) 


If Sft is the sum of squares between rows, Sq that between columns and 8 the total, we 
know that in the ordinary case considered earlier in the chapter, Sq is distributed as vx* 
with j — 1 d.f., and 8 — 8ji — Sc ^ vx* with (p — 1) (q — 1) d.f. It follows that 


S-8jt 


W, say. 


. (23.69) 


is distributed in the Type I form 

dP a ]V*<’-*>-*(l - . . .(23.60) 

It is easier to work with W than with z, but there is of course no difficulty in passing from 
one to the other. 

We proceed to find the first four moments of W in the population of (q I)** values obtained 
by permuting the rows of (23.68) in all possible ways. 


23.37. If in (23.68) we increase the members of any row by a constant a, it is easily 
seen that 8c and 8 — Sg remain unaffected, and hence so does W. Thus we may take 
the mean of each row to be zero and then Sji = 0. With this origin we have 


fr = (23.61) 

If now 

(23.62) 

i-i 

and the ik-statistics of the q values j = 1 • • . g, are written 

U ^ 2^ Ba (23.63) 

i,k 

we find • 

ivrr • • • • 

P — 1)2/% 

i 

E = 0. m • • • • 

E (R^) === (sf — 1) kjg kjgg 




. (23.66) 
. (23.66) 

. (23.67) 
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Then, for the moments of U, 

E(U) =0 (23.69) 

E {U*) = (q - 1) 2^' (23.70) 

E {U») = 6 (3 - 1) 2^' kfi kki k,2 + jg ~ ~ 2^' (23.71) 

i.k, I ^ i,k 

E an\ _ 3 (3 - 1)® V' M p , (3 - 1) (3 - 2) (3 - 3) V®' j. i, 

)- — J^~Y Z *** *** + W^i) 

+ 3 (3 - 1)® { {E’ k,, k^,)^ ~ E' fcl, fc|*} 

+ l?lg -J) (g -.3) jr' k ,2 + 12 {q-l)E' ka **2 *m2 • (23.72) 

3 

where E' denotes summation over values for whioh the subscripts are unequal and permu- 
tations are not allowed. 

Finally, for the moments of W we have 

n . Tvr\ 1 Mn. 


E (IF) = 


E{W ~ IF)* 


4 E' kf ^ kii2 

pW-l)\Ek;^ ■ 


p viy ^^8 _ 48 ^ k(-ji kk2 ki2 , ^ {q 2) E kg k^n 

' ’ P* (3 - 1)® (i^*i2)» F®3 (3 - 1)® 

B fIF — 

' ' ^® (3- l)* (2:' *;i2)® 3>M3 - 1)* (3 + 1) 

I 1152 E kf2 ^12 ^m2 
P*(3-l)* (Eki^)* 

16 (? - 2) (q - 3) r ka A:« 192 (g - 2) 27' A,., 

P* (3 + 1) (3) (3-1)* (3 - 1)* 3 (-2^ *«)* ■ 


(23.73) 


(23.74) 


(23.76) 


(23.76) 


These formulae can be derived in the manner of 16.33, but reference may be made to 
Pitman (1938) for further details. 

23.38. We now consider how far the first four moments of IF, as found above, agree 
with the first four moments of the distribution (23.60). The mean and variance of the 
latter are 

i and . _ -11 .... (23.77) 

P pHpq-p + 2) ' 

The means agree exactly. For the variances to agree we must have, from (23.74) and 
(23.77), 

4 E ki2 kic2 _ 2 (p 1 )^ . . 

p* (3 - 1) (^*<2)‘ p*(pq -P + 2)' ' ■ • 1 • *’1 

p- 2 E' kf2 kfd 

^ - -(El^ 


. (23.78) 


Writing 


. (23.79) 
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we find that (23.78) is equivalent to 


pq —p + 2 


. (23.80) 


The ratio K may have any value from 0 to 


the lower limit being approached when 


one of the second ^-statistics is much larger than the others, the upper limit when they 
are all equal. Hence all that can be said about the variance of W is that it is not greater 

2 (p — 1) 

than -~ 7 - ^ and that it takes this value when the variance of each p-class is the same. 

(g _ 1) 

Turning to the third and fourth moments, we note that in many cases where the varia- 
tion is not too skew the quantities and will be negligible. A number of terms in 
(23.75) and (23.76) may thus be neglected, but even those that remain are fairly com- 
plicated, and it is difficult to say how far the distribution of W will approach the Type I 
distribution (23.60). In practice the values may be worked out and compared. If there 
is reasonable agreement, the 2 -distribution of the variance ratio will hold in the particular 
population which we are considering. 

23.39. A better approach is to find the Type I distribution which has the same first 
two moments as W and to modify the 2 -test where necessary. It may be shown that when 
K is not too small the third and fourth moments of W and the fitted Type I distribution 
are in fairly good agreement, so that we may expect a good fit. 

1 2K 

The Type I distribution with mean - and variance ^ ^ has the mean and variance 

of W by definition. Its third moment is easily seen to be 


SK^ 

(q - 1) 


1 > - 1 H i 

q - I 


. (23.81) 


We have to see how far this differs from the actual third moment of W given by (23.75). 
Now 

3 S ki2 ^Ic2 ^12 ^ ^ ^k2 ^12 ^ ^i2 ^k2 

= S k^2 ^k2 ^12 — ^i2 ^ ^i2 — ^ ^< 2 ) 

= s k,2 {sr k,2 k^2 - ^ kli) + ^ 

and hence 

6 ki2 kfc2 ki2 ^ ^ 27 kl^ 

Since all the Jb’s here concerned are positive, 

JT S k^ ^ {2 i/g)* 


and hence 

\ZknY 

Hence,, from (23.82) and (23.83), 


ZJ^ I V = n -JT). 

!:*«)» (2: fc„)*/ < 

S.83), 

> 2K - 2-1- 2 (1 - A)» = ii* ^ 1 - 


I ^2 ^*2 ^/2 


(23.83) 


(23.84) 
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Similarly, since 

^ {wt'f - < (1 - JC) (• - i* - *«•) 

it appears that 


0 ^<2 ^*2 ^ Z 2 ^ ^2 ^ 

(-^ ^<2)* ^ 


. (23.85) 


On comparing (23.75) and (23.81), and assuming that the second term in the former may 
be neglected, we see that they differ by the factor whose limits we have found in (23.84) 
and (23.85), namely 


1 


K 


and 


3 +iC 

4 ‘ 


If K is not too small the limits arc not very diflFerent from unity, and the third moments 
are accordingly in fairly good agreement. 

In the same way but with rather more complicated algebra it may be shown that the 
fourth moments are in fair agreement. 

When all the rows are rankings, the case reduces to that considered in 16.33 et seq., 
and we have already seen that the distribution of W is closely approximated by the Type I 
distribution in that case. 


23.40. Suppose, now, that we have p classes of objects, one of each class belonging 
to a second series of classes, q in number. As our hypothesis wo will suppose that member- 
ship of the q^-classes is independent of the variate-values, so that we may suppose it to be 
a matter of chance how the values in any p-class are distributed among the (/-classes. On 
this hypothesis the variance ratio will follow the 2 -form approximately (subject to the 
conditions weliave discussed above) in the population consisting of the (q !)^ permutations 
of observed values ; and this will be so whether the parent is normal or not. 

By shaping the inference in this way, and making it conditional, we are thus able to 
apply the 2 -test even in cases of non-normality. The test of homogeneity still applies, but 
of course the inference is rather different from the usual type. This point has not, perhaps, 
been adequately emphasised in the past and there still seems to be confusion on the subject. 

Randomised Blocks 

23.41. The principle of testing in a conditional population has received its chief 
applications in a certain type of agricultural experiment (and analogous cases in other 
fields), known as a randomised block experiment. We are given p blocks of land and wish 
to test the existence of differential effects among q treatments, e.g. manurial treatments, 
of a crop to be grown on it. We divide each block into q plots and grow the crops on each 
of the pq plots. In any one block we apply a different treatment to each of the q plots ; 
and we allocate the treatments among the plots at random. 

This randomisation is an essential part of the process. If the treatments exert no 
effect the observed yields might have occurred in any order, and by making the inference 
in the proper way we are able to test in the 2 -distribution without assuming parent nor- 
mality or the non-existence of fertility differences between plots of the same block. If, 
of course, the parent is near to normality the test is strengthened. Had we not allocated 
the treatments at random the use of the 2 -distribution would not have been valid in the 
absence of normality (at least approximate) on the part of the parent. 
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23.42. It is of some importance to make clear the exact hypothesis which is being 
tested in this approach, since misunderstandings on the point have led to some rather 
heated controversy. If the treatments are numbered 1 to g, we consider the possible yield 
on the plot j, I; if it received the 1th treatment, say In actual fact only one of these 

treatments was carried out ; the other values of Xjf^ are hj^othetical and are based on 
our conception of what would happen if the treatments were differently distributed. The 
totality of values Xj|f^f^ form our hypothetical population. We are supposing that the 
observed yields can be expressed as 

m = ( 0 > 

where aj is an effect differing from block to block but constant within blocks, and is the 
“ individual ” plot effect which has a zero mean. The hypothesis we have considered in 
arriving at the validity of the z-test in conditional inferences is that every treatment affects 
every plot to the same extent, apart from the block effect a^. In short, we suppose that 
ifif (Q is the same for all 1. This is the hypothesis usually tested in data from randomised 
blocks. 

Neyman (1935a) proposed an alternative hypothesis, viz. that the mean effects of 
treatments over all blocks were the same, on the ground that we are interested in average 
treatment effects when testing fertilisers, not the effect on particular plots. The hypothesis 
here is that x^^ (j) = *,,, which is not the same as before ; and it appears from Neyman’s 
analysis that the z-distribution under randomisation may not hold to such a satisfactory 
approximation as in the former case. Once again we have to stress the importance of 
gaining a clear idea of the hypothesis under test. 

Example 23.8 (Eden and Yates, 1933 ; Pitman, 1938) 

Eden and Yates considered some data, based on actual experience of heights of wheat 


shoots, comprising eight 

classes of four, equivalent to the following measurements 



\ 


Class 




1 

2 

3 

4 

5 

6 

7 

8 

433 

455 

487J 

407i 

452^ 

257) 

434) 

475) 

429 

419^ 

389 

574^ 

436) 

263i 

520) 

473) 

383 

479 

463i 

477i 

415 

392 

470 

423) 

437 

504) 

469^ 

452i 

418 

420 

532 

481) 

The variances of the eight classes, 

in units of iV^h, 

are then found to be 


7628; 16,702; 22,669; 69,732; 3,666; 90,593 ; 26,297; 8672. 

The quantity K of equation (23.79) is then found to be 0-7677. The quantity 

“ 0-8077. Thus (23.80) is approximately satisfied and we expect that the 

z-distribution will be approximately reproduced by the data under random permutations. 

This was confirmed by Eden and Yates in a sampling experiment on the data. 1000 
sets of permutations were taken and z calculated for each. Agreement with expectation 
was good. 

Example 23.9 (Friedman, 1937) 

A good example of data from populations which are probably far from normal is given 
in Table 23.18, showing the standard deviations of expenditures on various items for six 
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income-groups. The figures relate to families of wage-earners and lower salaried workers 
in Minneapolis and St. Paul, U.S.A., in 1935-6. 


TABLE 23.18 

Standard Deviations of Expenditure on Certain Items of Families in Specified Income Orcmps. 

(Figures in brackets arc ranks.) 


Annual Family Income (dollars). 


Category of Expenditure. , 

I 750- : 1000- 

1 


! 


Housing .... 

. : 100*3 

(5) 

68 4 

(i) i 

Household operation 

. : 42*2 

(1) 

44-3 

(3) ! 

Food 

71*3 

(1) 

81-9 

(2) j 

Clothing .... 

. i 37*6 

(1) 

600 

(3) ■ 

Furnishings, etc. 

68-3 

(2) 

62-7 

(1) ■ 

Transportation 

. ! 46*3 

(1) 

82*2 

(2) : 

Recreation 

i 190 

(1) 

23*1 

(2) ; 

Personal care 

8*3 

(1) 

8*4 

(2) ' 

Medical care . 

20*1 

(1) 

33*5 

(2) ; 

Education 

. : 3*2 

(1) 

41 

(2) ; 

Community welfare . 

. 1 4*1 

(1) 

18*9 

(5) i 

Vocation .... 

. i 7*7 

(1) 

11*2 

(5) j 

Gifts 

. : 6*3 

(1) 

10*9 

(2) 

Other 

. 1 6*0 

(6) 

6 6 

(4) 


i 


1260- 

1500- 

1750- 

2000- 

2260-2600 

89*5 

(3) 

77*9 

(2) 

100*0 

(4) 

108*2 

(6) 

184*9 

(7) 

60*9 

(4) 

73*9 

(6) 

43*9 

(2) 

61*7 

(6) 

102*3 

(7) 

100*7 

(7) 

86*5 

(3) 

100*3 

(6) 

90*7 

(4) 

100*6 

(8) 

57*0 

(2) 

60*8 

(4) 

71*8 

(6) 

83*0 

(6) 

117*1 

(7) 

96*0 

(6) 

60*4 

(3) 

104*3 

(7) 

89*8 

(5) 

86*8 

(4) 

129*8 

(3) 

181*0 

(6) 

172*3 

(6) 

164*8 

(4) 

246*8 

(7) 

38*7 

(3) 

45*8 

(4) 

59*0 

(7) 

50*7 

(6) 

55*2 

(8) 

9*2 

(3) 

14*3 

(8) 

10*6 

(4) 

15*8 

(7) 

12*5 

(6) 

60*1 

(4) 

69*3 

(6) 

114*3 

(7) 

45*3 

(3) 

101*6 

(8) 

12*7 

(4) 

18*9 

(8) 

8*9 

(3) 

41*6 

(6) 

66*3 

(7) 

8*5 

(2) 

12*9 

(3) 

25*3 

(7) 

19*9 

(6) 

16*8 

(4) 

10*4 

(2) 

10*9 

(4) 

10*6 

(3) 

14*0 

(6) 

14*4 

(7) 

11*2 

(3) 

25*3 

(4) 

42*3 

(6) 

48*8 

(6) 

69*4 

(7) 

22*2 

(7) 

! 2*6 

(2) 

6*2 

(6) 

1*0 

(1) 

4*0 

(3) 


In brackets we show the ranks of the figure for different income-groups for each 
category of expenditure. We wish to know whether the standard deviations for each 
category >differ significantly for the different income levels. On the hypothesis that they 
do not it is a matter of chance how the ranks fall. 

The sums of ranks in each column are : — 

23, 36, 53, 67, 70, 70, 83. 

12 S 

The coefficient of concordance (vol. I, p. 411) is then IF = . , . where m == 14, 

^ m* (n® — n) 

n = 1 and S is the sum of squares of deviations of sums of ranks from the mean 

m (^ + _ 50 . ^0 g _ 2620 and W = 0*4774. We may test the significance 

2 

(vol. I, p. 419) by writing 

, , (m — 1) IT , „ . 

* = ilogV - =1-24 

v. = (n-l)-l = 6 f 
m 

Vj == (m — 1) Vi = 76|. 

The value of z is highly significant, and we conclude that standard deviation is related to 
size of income — ^the more money there is to spend, the more variable is the expenditure 
on particular items. 
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NOTES AND REFERENCES 


The idea of comparing variance between classes with the variance within classes in 
order to test homogeneity is found as early as Lexis (see footnote on page 119). Modern 
developments, and particularly the exact test of significance for normal parents, are due 
mainly to R. A. Fisher. Apart from papers by Irwin (1931 and 1934), connected accounts 
of the theory of variance analysis are hard to find, many points of thebretical interest being 
scattered among papers which are primarily practical. 

For the general theory and applications reference may be made to Fisher’s Statistical 
Methods (1925a, 1944) and Design of Experiments (1935c, 1942), to a useful introductory 
account by Goulden (1939), and to the writings of Yates, particularly his Design and Analysis 
of Factorial Experiments (1937ft). 

On the question of randomisation in preserving the z-distribution see Eden and Yates 
(1933), Welch (1937, 1938a), and Pitman (1938). References to work on ranking are given 
at the end of Chapter 16. 

For work on the distribution of the greatest of a set of variances see Fisher (1929a, 
1940a), Cochran (1941), Stevens (1939a), Hartley (1938), and Finney (1941a). For further 
work on the square-root and sin~'^ transformations see Cochran (1940ft), Beall (1942) and 
Curtiss (1943). 

The literature of this subject is now very large. Some further references are given 
at the end of the next chapter. 

EXERCISES 

23.1. If Xj {j = I ... n) are a set of normal independent variates with variances 


\/w .consider the transformation 

'^^k ^ ^ hj 

where the Z’s are defined by 
hk = V(^k/^^) 

k = 1 . . . n 


1 

II II 


j — 2, 3, ... n 
k =j 

l]k ■" 

j - 2, 3, ...» - 1 
k = j + 1, . . . n 


Show that the Z’s are orthogonal and hence that 

k^l Aj-1 

n 

is distributed as with n degrees of freedom. Noting that Ui = '^ Wj^Xj^/'\/Ew is dis- 
tributed normally with unit variance independently of • • • ^w» show that 

is distributed as ^l^h n — \ degrees of freedom. 
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Hence derive the z-test for the analysis of variance with unequal members in a one-way 
classification. 

(Irwin, 1942.) 

23 . 2 . Verify the arithmetic in the analysis of variance of Example 23.6. 

23 . 3 . Verify the arithmetic in the analysis of variance of Example 23.6. 

23 . 4 . In a bivariate table with k rows (different rows corresponding to different 

values of the a;-variate) write 

{% - y)^ 


q = , Z (Wj. 4), 

CT- a; 

where is the variance of the y variate, the variance, and the frequency in the row 
with variate- value x. Thus 

nU .. ^ * 

and the ratio on the right is the variance-ratio in a one-way classification with unequal 
numbers. 

Show that, for any form of population, 

E{h) ^k- I E(q) =--N -k 

var A = 2 (fc - 1) I- {ft, - 3) \z - + 1 


v&iq = 2{N - k) + {ft, - 3) jiT— + - 2*1 

ixtljc J 

cov {A, q) = {ft, - 3) I* - 1 + 4 — -i’ ”1- 


Hence, approximately, that 


F /l -I- 1 

W £!(h)E{q)i 

„ £*(*)[ var* 4 cov (*, ?) 3 var 


W ^ EHh) E{h)E{q)^ E^ {q) 

In the case when all rows contain the same frequency 


and then 




\q) ■ (JV-*p ■ 

Hence show that the mean, and variance of the variance-ratio are, to this order, independent 
of the distribution of indicating that the ^s-test is not very sensitive to deviations from 
normality. 

(E. S. Pearson, 19316. It is rather remarkable that the correlation of h and q, far from 
disturbing tho 2 -distribution, contributes to its stability.) 



CHAPTER 24 


THE ANALYSIS OF VARIANCE— (2) 

Estimation of CUtss-differences 

24.1. In the previous chapter we considered the analysis of variance mainly as the 
provider of tests of homogeneity. We have now to examine in more detail the problem of 
estimating class-effects, assuming that the homogeneity tests have shown them to exist. 
We discuss in the first instance the case in which there is only one member in each sub- 
class, and for the sake of simplicity confine ourselves to a two-way classification, though 
the theory is quite general. 

The fundamental hypothesis to be examined is that the data may be expressed in 
the form 

^jk = ® j + Cjik* • • , • • • (24.1) 

where and bi^ represent class-effects and C is a random normal variate with zero mean. 
Our analysis of variance will have shown whether this is an acceptable hypothesis, and 
our present problem is to estimate the unknown values of a'n and 6’s from the observed a?’s. 

24.2. The joint probability of the f’s is 

- % - 6*)* I dCii . . . dCpfi, . . (24.2) 

where v is the variance of C, and in conformity with the notation used in the previous chapter 
we have p .^-classes and q J?-classes. The maximum likelihood estimates of the a’s and 
6*s are then those which minimise the sum in curly brackets in (24.2), that is to say, the 
least-squares solution of the equations (24.1). In the usual way we find 





- h) - 0, 

j = 1, . . . p 



k — 1 



> . 

. (24.3) 




o 

1! 

1 

* = 1 gj 


which reduce to 









1 1 

i 1 

:S} 

. (24.4) 

Summing the first equation over 

dividing by p, 

and subtracting from the first. 

we obtain 


- 

- 

= 

j = 1, . . . . 

. (24.6) 

and similarly 







- 

- 

II 

1 

k — 1, . . . q. 

. (24.6) 


In (24.5) there are p equations, but if we sum them all we reach the identity 0 = 0, so that 
only p — 1 are independent. There is thus an element of indeterminacy which we may 
remove by supposing that a. = 0. Similarly we may take 6, = 0, and then we have 

== i * 1, . . . p . . . . (24.7) 

ftfc = i = 1, . . . g. .' . ^ . . (24.8) 
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|Our estimate of any class-efiFi^ct is equal to the deviation of the mean in that class from 
the total mean^ 

24.3. Evidently similar equations arise in the general w-way classification. We shall 
see below that they break down for unequal numbers in subclasses, except in a special 
case when the numbers are proportionate. 

The assumption that and have zero means is not, in effect, a restriction on gener- 
ality but only a convention. \lf we prefer it, we may consider the slightly more general 
hypothesis that C has a mean m, in which case we have to miniffitegp 


(24.9) 

This will be found to lead back to equations (24.7) and (24.8), with the additional equation 
for estimating m 


m = x.. 

dr again, if we prefer to absorb m into the a-effects we have 




= I 

« fc - ^.. J 


(24.10) 


. (24.11) 


the mean of in this case not vanishing. Which form we use is a matter of convenience. 


24.4. It is important to notice that the equations of estimation which we have just 
reached give each aj and b/^, independently of values in other classes. We obtain the same 
equation for whether we happen to be estimating other a's and 6’s or not. This property, 
as we shall see shortly, fails to hold if the numbers in subclasses are disproportionate. 
The situation is similar to that in which we can determine the constants in a regression 
line independently of the others if orthogonal polynomials are used, in that each constant 
is given by a separate equation not containing any of the others. Data of this kind are 
called orthogonal. 

The direct comparison of class-means which is possible with orthogonal data can be 
seen, from general considerations, to be legitimate. In comparing — a:,, with • 

the estimates of the effects in the ith and jth A -classes, we are in each case averaging over 
q R-classes with one member in each. The J5-classes, therefore, affect each mean to the 
same extent and do not affect their difference. If there are more members in some sub- 
classes than in others, the means are unequally weighted with different R-effects and 
the comparison is invalidated. 


24.5. Regarding x^^ — as the estimate of and — x^^ as the estimate of 5^, 
we see that the familiar equation 


^ (%• - * .)* == ^ (*i. - *..)* + ^ (*.fc - « .)* + ^ - *i. - *.* + ®..)* (24.12) 

can be regarded as an analysis of the sum of squares on the left, which has M — 1 degrees 
of freedom, into terms in which there is one degree of freedom for every fitted constant and 
a residual with (p — 1) (g — i) degrees of freedom. Every constant fitted reduces the 
number of degrees of freedom in the residual by unity. 
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Unequal Numbers in Subclasses 

24 . 6 . For a one-way classification we have already considered ( 23.7 and 23 . 8 ) the 
case where the numbers in subclasses are unequal. It was seen that the total sum of squares 
could be expressed as a sum between classes and k residual which were independently 
distributed and whose ratio therefore provided a homogeneity test in the usual way. 

When we try to extend this result to two-way or generally to n-way classifications, 
we begin to run into difficulties. We can still find, as shown below, an estimator of v based 
on p — 1 degrees of freedom and differences between A -classes, and one with g — 1 d.f. 
based on differences between R-classes ; but these are no longer independent, and conse- 
quently we cannot subtract their sum from the total sum of squares in order to obtain 
a residual or an interaction term which also provides an unbiassed estimator. 

On the other hand, there is now available an independent estimator of v wliich did 
not appear in the orthogonal case where only one member was included in each subclass. 
In fact, since there are several members in any given subclass, we can find an estimator 
of V based on those members alone ; and we may pool all such to form an estimator with 
N -- pq degrees of freedom, where there are pq subclasses. This estimator will be inde- 
pendent of subclass means and any estimators based on them, and hence provides 
a “ residual ” such as we require to carry out homogeneity tests. 

24 . 7 . Suppose we have a two-way classification into p A-classes and q i?-classes, and 


let the number of members in the subclass A, Bu be 


Let 

be the mfean of these 

members. We may array the 

means as 


Xii 

Xi2 ... 




Xzi 

Xti ... 



. (24.13) 

^pl 

Xp2 ... 





Now we may, in the first instance, test for homogeneity by ignoring the differences 
between A- and R-classification and merely regarding the data as a one-way classification 
with pq classes. The usual test for homogeneity is then applicable. The sum of squares 
between means of classes will have pq — I degrees of freedom, the total N — 1 d.f., and 
the residual N — I — {pq — 1) = ^ — pg d.f. This residual, in fact, is the one men- 
tioned in the previous section, and is based on the pooled sums of squares within the pq 
classes. The other term based on pg — 1 degrees of freedom is the sum 

and is derivable from the array (24.13). 

24 . 8 . To test the effect of A-classification separately we proceed as follows : — 

Any is the mean of n^j^ values and, on the usual hypothesis as to normality, will 

have variance — . If x is the mean of all N values we have 
njk 


. ( 24 . 14 ) 
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Let the mai^inal unweighted means in (24.13) be so that 


where 


Xj,=-Z % 

^ 

P J 

On the hypothesis of homogeneity the variance of is given by 


(24.15) 


(24.16) 


(24.17) 


Now let us regard the means as the means in p classes whose numbers are Nj, as 
is legitimate from (24.16). Then writing 


ENj 


. (24.18) 


we have for an unbiassed estimator of v 


-i- E Nj (xj, - c)* = -J— I E (Nj ) -c^EnX. . . (24.19) 


This estimator has p — I degrees of freedom and is distributed as (This follows from 
the one-way case except that Nj may not be integral ; and its general truth may be estab- 
lished as in Exercise 23.1.) It is independent of the residual with N — pq d.f., and hence 
the u4. -effects may be tested separately. 

Similarly, if 


Mk P^ j W/ 


. (24.20) 


an unbiassed estimator of v is given by 


where 


.-l-[E(M^^,^)-d^EMX, 

q-l{ k k } 


d = 


Z Mfg X k 
k 


SMk ’ 

k 


. (24.21) 


. (24.22) 


and this also may be compared with the independent estimator based on N — pq d.f. 


Example 24.1 (data from Brandt (1933) considered by Yates (1934a) ) 

Table 24.1 shows, for a number of breeds of pig, the numbers of each breed, 
divided into male and female, and the total logarithm of the percentage bacon yielded by 
the slaughtered carcases. The logarithm h€ts been taken so as to normalise the variate. 
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TABLE 24.1 

Numbers and Logarithm of Percentage Bacon in Breeds of Pigs. 


Breed. 

Female. 

Male. 

Number. 

Log. Percent. 
Bacon. 

Number. 

Log. Percent. 
Bacon. 

Hampshire 

33 

66*66 

89 

18104 

Duroo Jersey 

61 

98*69 

141 

281*43 

Tamworth . 

13 

26*90 

17 

34*20 

Yorkshire 

4 

7*62 

9 

17*58 

Berkshire 

8 

14*64 

4 

8*20 

Poland China . 

16 

28*11 

32 

64*42 

Chester White . 

36 

66*90 

47 

90*52 

Others .... 

12 

23*32 

23 

46*70 

TOTAIiS . . 

171 

331*73 

362 

724*09 


The total sum of squares, which is not obtainable from this table as it stands, we quote 
as 130142. 

The class-means and reciprocals of class-frequencies are given in Table 24.2. 


TABLE 24.2 

Class-Means and Reciprocals of Class-Frequencies for the Data of Table 24.1. 


Breed. 

Female. 

Male. 

f 

Unweighted 
Mean of 
Means. 

Mean. 

i/«ifc 

Mean. 

ttnflc 

Hampshire , . . . 

Duroc Jersey . 
Tamworth .... 
Yorkshire .... 
Berkshire .... 
Poland China . . 

Chester White . 

Others 

2*016,667 

1*935,099 

1*992,307 

1*906,000 

1*830,000 

1*874,000 

1*911,429 

1*943,333 

0*030,30 

0*019,61 

0*076,92 

0*260,00 

0*126,00 

0*066,67 

0*028,67 

0*083,33 

2*034,158 

1*996,968 

2*011,766 

1*963,333 

2*050,000 

2*013,126 

1*926,968 

2*030,434 

0*011,24 

0*007,09 

0*058,82 

0*111,11 

0*260,00 

0*031,25 

0*021,28 

0*043,48 

2*025,412 

1*966,628 

2*002,036 

1*929,167 

1*940,000 

1*943,662 

1*918,694 

1*986,884 

Unweighted Mean of 
Mentis 

1*926,979 

(Total) 

0*680,40 

2*001,841 

(Total) 

0-634,27 

1*963,910 
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Taking first the classification into male and female (9 = 8 ), we find, from the relations 

1 _ 1 „ 1 
I?" 


N. 


i r k %fc 


Nx = 


64 


0-680,40 


- = 94-0623 




64 


0-634,27 


= 119-7896. 


Then, from (24.18) 


, _ (94-0623 X 1-925,979) + (119-7896 x 2-001,841) 

c — — 




94-0623 + 119-7896 


= 1-968,474. 


Thus our estimate of v, with one degree of freedom 

==E(Njxl) -c^{ENf) 
= 0-3032. ' 


Similarly for the eight breed-classes we find an estimate of v with seven degrees of 

j- j XI 0-6056 
freedom to be — = — = 0-0866. 

7 

Considering the 16 subclasses as a one-way classification, we find the following 
preliminary analysis (the arithmetical details of which we omit) : — 


TABLE 24.3 

Analysis of Variance of Data in Table 24.1. 


Sum of Squares. 

d.f. 

Quotient. 

Betweea classes .... 

1*2715 

16 

0*0848 

Residual 

11*7427 

517 

0*0227 

Totals 

13*0142 

532 



The variance ratio here gives a value of z equal to 0-659, which is significant. Thus the 
data are not homogeneous. 

We now require to decide whether the departure from homogeneity is due to either 
breed or sex or to a combination of the two. For sex-differences we have found an estimate 
of V equal to 0-3032 with one d.f. Comparing this with the independent residual from 
Table 24.3 of 0-0227 with 617 d.f., we find that the effect of sex is significant. Similarly, 
for breed, the estimate of v is 0-0866 for 7 d.f., which again is significant. We conclude 
that both breed and sex infiuence the departure from homogeneity. 
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It is particularly important to note that since the estimates between breeds and between 
sex are dependent, we cannot analyse the variance as follows : — 

TABLE 24.4 

Inc&rrect Form of Arudyaia of Variance of Data of Table 24.1. 


Sum of Squares. 

d.f. 

Quotient. | 

1 

Between sexes 

0*3032 

1 

1 

0-3032 1 

Between breeds 

0*6056 

7 

0*0866 1 

“ Interaction ** 

0*3627 

7 

0*0518 j 

Residual 

11*7427 

617 

0*0227 i 

1 

Totals 

13*0142 

532 1 



In fact the term shown as '' interaction ”, calculated so as to make the sums of squares 
and degrees of freedom additive in the usual way, is not an unbiassed estimate of v. This 
is a critical point of difference between the orthogonal and the non-orthogonal case. 


24.9. Suppose that the homogeneity test has shown the existence of significant 
class-effects. As before, we turn to consider the hypothesis that the data can be expressed 
as the sum of A- and B-effects separately with a random normal residual. Let be 
the typical member of the ( j, jb)th subclass, I var 3 dng from 1 to Our hypothesis is then 

..... (24.23) 

where f is normal with variance v. For convenience we will regard the mean of f as absorbed 
in the coefficients a, so that we may take C to have zero mean. 

The usual process of estimation of the a’s and 6’s leads to the minimisation of the 
sum over all N values of 






hY 


. (24.24) 


Differentiating with respect to a^ and 6^^., we find the series of equations 
Z Z' — a^ — j == 1 . . . p 

k 

Z Z — aj — bjf.) ^ 0, k = 1 . . . i 

J 

where £' denotes summation over the itjic values in a subclass. These equations reduce to 

. (24.25) 


k k k 


Z a^ + Z 6* = 27 TCjfc 
i i i 

Writing Nj^ for Z and for 27 we have 

k j 

^ ^jk^jk J — 

k k 

Znjkaj + W fc b„=Z k = I, 

To whfoh we may add 


P 

?• 


. (24.26) 
. (24.27) 


27 bfc — 0. 


. (24.28) 
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Hftd w© chosen to absorb the mean of f into the 6*8, this last equation would be replaced 
by SOf = 0. 
i 

When all the w-’s are equal these equations reduce to the orthogonal case, and each 
a- or 6-coefficient can be independently estimated. In the contrarj’’ case the equations 
have to be solved as they stand. 


Example 24,2 

Returning to the data of Table 24.1, we find for equations (24.26) and (24.27) the 
following, the values of the constants required being obtainable from the body or marginal 
sums of the table itself: — 


17l0i + 336i + 516a + 1363 + 464 + 865 + 156e + 356: + 1263 

362aa + 896i + 1416a + I763 + 964 + 465 f 326* + 476, f 2363 
33ai + 89aa + 1226i 
51ai + I4laa + >9262 

13ai + Haa I 306:, 


4a, + Otta 
8ai + 4a2 

15a, + 32a2 
35ai "j- 47aa 
12ai -f 23aa 


^ 1364 


126, 

+ ‘176e 

-j- 

+ 3563 


= 331-73 
= 72409 
= 247*59 
== 380*12 
= 60*10 
-= 25*20 
= 22*84 
= 92-53 
= 157-42 
= 70*02 


To which we may add a, + Wa = 0. 

The solutions are 

> a, = aa = 0*026,507 ; 

6, -= 2*017,259 ; 62 == 1-967,367 ; 63 - 1*999,799 ; 64 = 1*928,267 ; 

65 -= 1-912,169 ; 63 = 1-959,136 ; 67 -= 1-915,877 ; 63 = 1-992,241. 

These give us the “ best ” estimates of the mean effects of sex and breed on the 
hypothesis expressed by (24.23). 

The mean of the 6’s is 1*961,514 which may be taken as an estimate of the mean of C, 
the 6-effects then being the differences of the above 6-values from this mean. 


24.10. Let us now consider the analysis of variance in the non-orthogonal case, 
when constants have been fitted by least squares in the above-mentioned way. 

To make the discussion clearer wc will regard the estimation as relating to p constants 
ay, relak^d by Z (ay) — 0, q constants related by Z (6^) -= 0, and the mean m. There 
are thus p + q — I independent constants which, in effect, provide estimates of the means 
of subclasses. Whatever these means really are, the residual quotient based on N pq 
degrees of freedom gives an unbiassed estimator of y, the common variance. We have 
now to analyse the remaining sum of squares based on pq ~ \ d.f. 

If the true (population) values of the constants are denoted by ay, and //, the sum 

^ ^^iki -«-j- Pk- /O* 

is distributed as with N degrees of freedom. Developing yet another variation on 
a familiar theme, we show that the corresponding quantity 

^ (^m - ^ - “i - Pk- ^ («> ““ ay)“ 

- 2: (6^ - pY . (24.29) 

is distributed as with N — (p + q — 1) d.f. 

A.S. — ^VOL. II. 


Q 
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In fact, equations (24.26) and (24.27) show that the estimators a, b (and in our present 
case m also) are linear in the variables x. We can then find p + q -- I orthogonal normal 
variables in terms of which they can be expressed. Their sum of squares will be distributed 
as vx^ with p + g — 1 degrees of freedom (not some multiple of x^ because the mean value 
must be jp + y — 1 in virtue of 18.17). Thus the remaining term 27 {x^j^ — ay — — m)* 

is distributed as withN — (p + q -- 1) degrees of freedom, independently of the portion 
due to the constants a, b and m. 

Furthermore, the actual reduction in sums of squares, equivalent to the sum of the 
last three terms in (24.29), may be easily determined. Precisely as in the similar problem 
of evaluating residuals in a regression equation, we have 

^ — Of ““ ftib — w)* == 27 Xj/gi — Zaj S E x^ja — mE x^j^ . (24.30) 

jk,l kj,l Jkl 

where, of course, summation takes place over all values. 

24.11. The total sum of squares is already calculated about the estimated mean 

w, so that the reduction for the term 27 — Narf, has already been taken into account. 

The total sum is then distributed as vx^ with N — 1 d.f., as we already know. We know 
further that we can split off the independent residual sum based on N — pq degrees of 
freedom. This leaves us with a sum based on pq — \ d.f. From the previous section it 
follows that we can analyse this sum into two parts : (a) the sum of squares due to fitting 
the constants and accounting for p + g — 2 d.f., and (6) the remainder based on 
pg — 1 — (-p 4- gr — 2) ~ (p — 1) (5 — 1) d.f. This remainder is independent of the sum 
of squares due to fitting constants and provides an unbiassed estimator of v. If the ratio, 
as compared with the residual based on iV — pq d.f., is significant, the hypothesis of additive 
effects breaks down. In short, we may regard this quantity as an interaction term. 

24.12. One important point to notice in this connection is that the interaction term 
dei)ends on whether p + y — 2 or fewer constants are fitted. In the orthogonal case we 
can determine an interaction term once and for all, however things stand in regard to the 
estimation of inter-class effects ; but for non-orthogonal data the number of class-effects 
estimated affects the interaction term, and if necessary a new significance test has to be 
applied if further estimates are calculated. The situation is similar to the testing of 
regression coefficients when orthogonal pol 3 momials are not employed. 

Example 24.3 

Returning again to the data discussed in Examples 24.1 and 24.2, let us regard the 
means in all 16 subclasses as simultaneously under estimate. For the reduction in sum 
of squares due to the constants we find, using the values of a and b found in Example 24.2, — 

0 026,607 (- 331-73 + 724-09) + (2-017,269 X 247-69) + (1-967,367 X 380-12) . . . 

_ (J« . 1.0414.. 

533 

Here, for instance, the sum 2*01 is given by multiplying Oi by the term ^Xif^ already 
jbund. The last term removes the effect of including the mean among the 6’s. 
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The sum of squares between classes was found in Example 24.1 to be 1-2715, based 
on 15 d.f. We then have 


Sum of Squares. 

d’f. 

Quotient. 

Sex and breed (estimation of constants) 

1*0416 

8 

0*1302 

Interaction ' . 

0*2300 

7 

0*0329 

1 

Between classes 

1*2716 

15 



Comparing the interaction term 0*0329 (7 d.f.) with the residual 0*0229 (617 d.f.) we see 
that it is not significant. 

If we neglect sex and consider breed alone, we have only to estimate eight constants 
bi ... bs subject to S (b) = 0. The sum of squares for breed alone is given by 

^ (247-69)* + ^ (380-12)* + . . . _ i (1065-82)* = 0-7263. 

Similarly the sum of squares for sex alone will be found to be 0*4224. We have the 
following analysis : — 


TABLE 24.6 

Further Analysis of Variance of Data of Table 24.1. 


Sum of Squares. 

i 

d.f. 

Quotient. | 

i 

Teat for Sex 



1 

1 

1 

Between breed (estimation of constants) 

0*7263 

7 

- 1 

Sox 

0*3162 

1 

0*3162 1 

Sex and breed 

1*0416 

8 

— 1 

Teat for Breed ■ 




Between sex (estimation of constants) . 

0*4224 

1 

i — 

Breed 

0*6191 

7 

0-0884 

Sex and breed 

1*0416 

8 

i 

Interaction 

0*2300 

7 

i 0*0329 

Between classes 

1*2716 

16 



Here, for instance, if we test for sex there are seven independent constants for breed 
and one for sex, the latter being the only one that interests us ; and similarly for breed. 
On comparison with the residual 0*0227 both sex and breed are found to be significant. 

24 . 13 . The reader may perhaps find the various tests of Examples 24.1 and 24.3 
confusing, and we accordingly summarise our results for the case of unequal numbers in 
subclasses. 

In every case, except where each subclass contains not more than one member, an 
estimate of the common variance v may be obtained, with N -- pq d.f., by pooling the 
sums of squares within the pq subclasses. Call this 
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Homogeneity may then be tested (a) by considering the pq classes as a single one-way 
classification and comparing the quotient between means with Vi, or (6) by calculating 
for either classification separately the estimates based on (24. 19) and comparing them with Vi. 

If homogeneity is rejected in favour of the additive effect of classes expressed by the 
usual hypothesis, the sum of squares between all classes based on pq — I d.f. may be split 
into independent sums related to the fitting df the constants and to an interaction term. 
The latter can be compared with to test for interaction. If this is not significant, alter- 
native tests for effects between A- and between R-classes may be derived by testing the 
sum of squares attributable to the fitting of the respective constants against Vj. These 
tests are, in effect, tests of one class neglecting the effect of the other, and may not be 
accurate if the latter effect is not negligible. It is probably better to fit constants to both 
classes simultaneously in the first instance. 


Proportiormte Frequencies 

24.14. We have previously spoken of non-orthogonal data as meaning any classi- 
fication with unequal frequencies in the subclasses, but there is one other case of unequal 
frequencies for which orthogonality exists, namely the one in which frequencies are pro- 
portionate, i.e. there are marginal frequencies Ip such that 

..... (24.31) 

Here the means of A -classes are estimates of the individual corresponding a’s (though it 
must not be overlooked that they are based on different numbers of members in margins), 
and the sum of squares between A -means may be computed in the usual manner appro- 
priate to a one-way classification with unequal numbers. Similarly for B. The interactions 
may be estimated by subtracting the A- and J5-sums from the sum of squares between 
classes. We leave it to the reader to verify these statements. 


Special case of 2 X 2 .. . Classification 

24.15. The foregoing analysis can be extended to the n-way classification, but in 
the general case the solution of the equations becomes rather complex and the arithmetic 
a considerable nuisance. Where, however, the classifications are simple dichotomies the 
problem simplifies to a great extent. For instance, in equations (24.27), if there are only 
two values of ap which we may take to be + a and — a, we have 

^ ^hk ® + ’^2k ® • 

We have selected the a’s so that Z (a) = 0, which implies that the mean m is amalgamated 
with the 6*8. Substituting for the 6’s in (24.26), we find 

a (2^,. 

L k .k j k k ,k 

which reduces to 


/ Tin ni , ^ 



^11 


(®ii — ®ii) H ®i») 4" 


(24.32) 


Thus a is the weighted mean of the differences of corresponding jS-class means and may 
be determined direct. So generally for a 2 x 2 x 2 . . . classification. The differences 
may be tested for homogeneity by the 2 -test, which in this case reduces to the t-test. 


24.16. In view of the relative complexity of the non-orthogonal case, it is natural 
to wonder whether any serious error would be committed if we regarded the p X q table 
of array means as an ordinary two-way table with one member in each class and analysed 
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the variance accordingly. Evidently such a procedure sacrifices a lot of information about 
variation in subclasses, but that is not the point. Is the analysis valid ? 

The hypothesis on which the analysis is based is equality of variance in subclasses. 
If the numbers in subclasses are very unequal the means based on them will have very 
unequal variances, and we expect that the analysis may be misleading. If, however, the 
numbers are close to equality the analysis will probably be approximately correct. 

Example 24.4 

Reverting once again to the data considered in earlier examples, we have the following 
analysis for the variance of the 2x8 table of class-means : — 


Sum of Squares. 


d.f. 

Quotient. 

Between sex ■ 

0-3032 

1 

0-3032 

Between breed ! 

0-2635 

7 

0-0376 

Residual > 

02387 

7 

0-0341 

Totals | 

0-8054 

15 



The sum of squares between sex is the same as before, as it must be for a dichotomy, 
but the effect of breed is seriously underestimated and would not be judged significant by 
comparison with the interaction term, which is our residual. The numbers in the breed- 
classes are, in fact, too different to justify the approximation. 

The Missing Plot Technique 

24.17. The simplicity of the analysis of variance in the orthogonal case and the 
economy imported by keeping the number of values as low as possible often leads to the 
carrying out of experiments with only one member in each subclass. But this has a certain 
practical danger in that the value in a subclass may be lost through circumstances beyond 
the experimenter’s control. For instance, an animal may die in the course of an experiment, 
or a crop on a particular plot may be ruined by pest ; or sometimes a record may actually 
be lost after measurements have been carried out. In such eases we may estimate the 
missing values and perform a variancje-analysis in the following wa^^. 

24.18. Consider in the first place a. p x q classification with certain missing values, 
r in number. We assume as usual that the variate-values are expressible in the form 

• • • • • (24.33) 

and we know that the “ best ” estimators of the constants are 

^ m asa a?,, "I 

f (24.34) 

The quantities on the right are, however, unknown to us because of the missing values. 
Suppose that we estimate the constants by minimising 

27' (xjjg — — bje — (24.35) 

where the summation 27' takes place over known values. Our estimators are then deter- 

minate and may be written a^, and m\ . 
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We will now estimate the missing value on the plot {j, k) by the equation 

=s Uy “f* -j- , . • • • (24.30) 

Wo have 

^ ip^jk — o,j — bjf — m)* = S’ (Xjh — ttj — bfi~ m)* + S (Xjk — Uj — b/^ — m)*. (24.37) 

r 

Let US now consider this as a function to be minimised, involving the unknowns a, b, m 
and r further unknowns ^jk- The equations giving the latter will be obtained by differ- 
entiating (24.37) with respect to each and in fact are typified by 

Xjk = o>j b/f m', 

that is to say, by (24.36). The other constants are given by such equations as 

i^jk — a'j — b'^ — m’) -H S (Xy^ — a'j — b'h — m') = 0. . . (24.38) 

r 

The second term vanishes, and hence we obtain the same minimal values for a^, 6^^. and 
m' as by minimising (24.36) by itself. Furthermore, the equations of estimation (24.38) 
may be written 

2* (aj^fc — — m') = 0, . . . . (24.39) ’ 

where the summation takes place over all values, those of the observed a:’s where known 
and over the estimated X’s where values are missing. 

It follows that if we write for the r missing values, ascertain the residual sum of 
squares, which will be a function of observations and these r unknowns, and minimise 
it for variation in these unknowns, we shall obtain equations providing estimates of the 
unknowns equivalent to (24.36). The following example illustrates the method. 

Example 24.5 (Yates, 19336) 

The following table shows the measurements of intensity of infection of certain potato 
tubers under eight manurial treatments in ten blocks. 


TABLE 24.6 

Intensity of Infection of Potato Tubers. 
Blocks 


Treat- 

ments. 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Totals. 

1 

3-65 

2-29 

b 

2-00 

3-34 

3-83 

3-86 

3-60 

2-23 

2*91 

27-61 + b 

2 

2-30 

403 

2-64 

2-82 

3-29 

2-93 

/ 

2-66 

2-20 

2-30 

24-96 +f 

3 

3-96 

3-62 

3-46 

2-60 

2-94 

3-70 

3-82 

2-64 

3-18 

3-69 

33-41 

4 

2*99 

3-99 

2-90 

3-97 

4-49 

4-70 

3-86 

h 

3-60 

3-60 

33-99 + h 

6 

a 

3-07 

3-49 

1-07 

3-99 

3-48 

3-80 

3-68 

3-24 

2-70 

28-62 + a 

6 

2-36 

3-47 

2-64 

3-17 

3-26 

3-28 

g 

i 

3-07 

3-12 

24-37 -f jr -f » 

7 

216 

2-34 

1-96 

2-60 

3-77 

d 

3-20 

3-47 

2-67 

3-33 

26-60 + d 

8 

3*16 

2-62 

2-39 

. 3-68 

c 

e 

3-86 

3-36 

2-60 

4-13 

j 26-69 -f c + e 

TOTAIiS 

20-48 

26-33 

19-38 

21-81 

26-08 

21-92 

22-39 

19-10 

22-69 

26-77 

i 223-86 *{- €t 


+ a 


+ 6 


4- c 

4-d+e 





! +6+c-l-d+e 

i +f+g+h+i 
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There are nine missing values in this table, indicated by the letters a i. Omitting 
purely numerical terms, which are irrelevant for the purposes of minimisation, we have 
for the total sum of squares, 

aa + 62 + c* + . . . + fa - (223*86 + a + b + c + . . . + i)^ ; 

for the sum of squares between blocks, 
i { (20-48 + a )2 + (19-38 + 6)2 + . . . + ( 19*10 +h + i) 2 } 

— ‘so (223*85 + a + 6+ c + . . .+i)2; 

and for that between treatments, 

{ (27-61 + 6)2 + (24*96 +/)2 + . . . + (25*59 + c + e)^ } 

— iV (223*85 + a + 6+ c + . . .+ i)** 
The residual sum of squares is the difference of the first and the sum of the second and 
third of these expressions. For minimisation we differentiate with respect to a, 6, ... i 
in turn. On some arithmetic simplification we find 


63a + 

6 + 

c + 

d -f* 

c + 

/ + 

9 + 

h 4* 

i = 209 11 

a + 636 + 

C + 

d + 

c + 

/ + 

9 + 

h + 

i = 19003 

a + 

6 + 

63c H- 

d - 

7c + 

/ + 

9 4- 

h + 

i = 231-67 

a + 

6 + 

c + 

63d -- 

96 + 

/+ 

9 + 

h "|- 

i = 199-36 

€L -}- 

6 - 

7c -- 

9d + 

636 1- 

/ + 

9 + 

h + 

i = 200-07 

a + 

6 + 

c + 

d 4' 

€ + 

63/- 

Qg + 

h + 

i = 199-73 

a + 

6 +. 

C + 

d + 

e — 

9/ + 

63^ + 

h - 

7i = 196-01 

a + 

6 + 

C + 

d + 

e + 

/ + 

9 + 

636 - 

9t = 239-07 

a + 

6 + 

c + 

d + 

c + 

/- 

717- 

96 + 

63i = 162-11 


This set of linear equations can, of course, be solved by routine methods, but also by iterative 
processes as follows : — 

The mean of existent values is 3-15. Assume this to be approximately the values of 
b, c . . , i. Then for a we have, from the first of the above equations — 

a = {209*11 - (8 X 3*16) } -= 2*92. 

Taking this value of a and 3*15 for c, d . . , we find for 6 from the second equation, 
f> = -03 {190*03 - (7 X 3*15) - 2*92} = 2*62. 

Similarly, from the third equation, 

c = Vy {231*67 + (2 X 3*15) - 2*92 - 2*62} == 3*69, 

and so on. On reaching i we recalculate a from the first equation, using the approximations 
to the values of the other constants already obtained ; and so on until our values do not 
alter. In this case only a second approximation is necessary, the values being — 



a 

b 

c 

d 

e 

/ 

<7 1 

h 

1 

i 

First Approx. ... 
Second Approx. 

2*02 , 
2-88 : 

2-62 

2-58 

3*69 

3*73 

3-27 ' 
3-33 

3*76 
3*76 ' 

3*20 

3*32 

3*60 

3*61 

3*88 
3*89 ! 

1 

3*22 
3*22 . 


These are our estimates of missing yields. The treatment means are found to be : — 
1 2 3 4 5 6 7 

3-009 2-^828 3*341 


3-788 3-140 3-120 2-883 


8 

3-308 
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24.19. The question now arises how we may analyse the variance of data for which 
missing values have been estimated in this way. 

The original data provided a classification with unequal numbers in subclasses and 
can be analysed by the methods given earlier in the chapter ; except that, since ho sub- 
class contains more than one member, we cannot find a residual sum of squares within sub- 
classes based on N — pq d.f. (N — pq, in fact, is a negative number.) For instance, 
regarding the data as a one-way classification with pq — r classes, we shall have an analysis 
of this type : — 


Sums of squares d.f. 

Between classes ♦ . . p + q — 2 

Residual . . . . (p — 1) (g — 1) — r 

Total . . . pq — r — I 


. (24.40) 


The effect of the two classifications separately can be dealt with in the manner of 
Example 24.1. 


24.20. Two simplifications are possible. In the first place, since the minimisation 
of the residual is the same for the original data as for the data completed by estimates of 
missing values, we can use the latter to compute the residual precisely as for an orthogonal 
case, which simplifies the arithmetic. 

Secondly, it appears that to an adequate approximation we may substitute the esti- 
mated values for missing values and analyse the resulting material in the ordinary way 
as if it were orthogonal. If the proportion of missing values is high this approximation 
may perhaps break down, and in practice we should probably regard the experiment as 
ruined. More usually only a few records are missing, and the effect of replacing them by 
estimates is hardly likely to affect judgments of significance seriously. 

Example 24.6 

Continuing the analysis of the data of the previous example, we find, for the total sum 
of squares, 32-1012 with 70 d.f. The analysis of the completed data, that is to say the original 
data plus the estimates of missing values, is as follows : — 


Sum of Squares 


d.f. 

Quotient. 

Between blocks 

9-7176 

9 

1-0797 

Between treatments .... 

6-5812 

7 

0-9402 

Residual 

17-6902 

64 

0-3276 

TOTAIiS 

33-9890 

70 



♦ It is assumed that no row or coliunn in the two-way classification is entirely empty. If it were, 
we should have to ignore it .and confine attention to the remaining arrays. 
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Treating the original data as a case of unequal class numbers we find : — 


Sum of Squares. | 

1 

d.f. 

Quotient. 

j Between blocks and treatments i 

14-4110 

16 

0-9007 

! Residual 

j 

17-6902 

54 

0-3276 

j Totals 

32-1012 

70 


blocks only : — 




1 Sum of Squares. 


d.f i 

Quotient. 

1 Between blocks 

8-5690 

9 

0-9521 

I Remainder 

5-8420 

7 

0-8346 

Blocks and treatments . ; 

14-4110 

16 



For treatments only - 


Sum of Squares. 


d.f. 

1 

Quotient. 

1 Between treatments .... 

6-2648 

7 

0-8950 

Remainder 

8-1462 

9 

i 

0-9051 

. \ 

Blocks and treatments 

14-4110 

T" 

! 

16 



Whether we use the analysis of completed data or the more exact form, we see that 
differences between blocks and between treatments are significant as judged by the residual 
variance. The two analyses are, in fact, not very different, and even with as many as nine 
missing values out of 80 we should not err by substituting estimated values and treating 
the data as orthogonal. 

Relationship with Regression Analysis 

24.21. The general w-way classifications to which variance-analysis may be applied 
are not necessarily determined by a measurable variate. As for contingency tables, rows 
or columns can be interchanged without affecting the analysis. We can, however, regard 
a multivariate frequency table as an n-way classification and apply variance-analysis to 
it ; and just as regression and correlation analysis provide a refinement on contingency 
analysis because of the arrangement of the classes in order by reference to a variate, so we 
may to some extent refine the analysis of variance in such a case. 

24.22. Consider in the first instance a p x q table of frequencies in the form of a 
correlation table. We will suppose the .4 -classification to be according to the variate x 



234 


THE ANALYSIS OF VARIANCE 


and the ^-classification according to y. Let us now consider the hypothesis that the data 
emanate from a normal bivariate population with zero correlation (or, somewhat more 
generally, that for any given y the x’s are distributed normally with the same mean and 
variance). We can then regard the data as a one-way classification according to y with 
unequal frequencies and analyse the variance in the usual form : — 


Sum of Squares. 

d.f. 

Quotient. 

Between classes . 


1 

1 

1 

g - 1 ‘ 

N var a? 
g — 1 



_ 


Kesidual 

£ (xij — ^)* 

\ N-q ' 

1 i 

^ (1 — iy*) vara? 
JV -g 

TOTAIiS . . . 

N vara? 

1 N -1 i 

1 



Here is the mean of a:-values in the jth ^-class, x is the mean of all N values, x^^ is the 
variate-value in the ith rr-class and jth j^-class, and there are q ^-classes. The quotients 
are expressible in terms of the correlation ratio of x on y, viz. (cf. 14.23. vol. I, p. 361). 

Now, on our hypothesis, the sums of squares between classes and the residual are 
independently distributed in the Type III form, and hence the variance ratio 


N — q 


. ( 24 . 41 ) 


can be tested in Fisher’s distribution with Vi = q — I, == N — q. This is the test we 
gave in 14.25 (vol. I, p. 353) and it is reached by an argument of essentially the same 
kind. 


24.23. Now suppose that our p x q table is normal but correlated ; or, somewhat 
more generally, that the values in arrays of constant y are normally distributed with the 
same variance but with means which vary linearly with say 

ntj + byj, ( 24 . 42 ) 

Then our data can be represented by the form 

^ m byj Cip - . . . . ( 24 . 43 ) 

where the f’s are distributed normally with zero mean and the same variance v. Apart 
from the constant m, the only unknown here is the constant 6. Our least-squares estimates 
(measuring from the means of x and y) now lead to the familiar form for the regression 
coefficient 



where summation takes place over all values observed. This is, of course, equivalent to 

j _ c.T(., y) 

vary 
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Further, th^ reduction in sum of squares attributable to fitting the constant b is 

ifb cov (a?, y) = — ~ V) ^ y^p a* . . . (24.46) 

var 2 ^ 

where r is the correlation coefficient of the sample. 

Our analysis of variance may then be written — 


TABLE 24.7 

A'nalysis of Variance of a Correkdian Table 


Sum of Squares. 


d.f. 

Quotient. 

Regression constant 6 

1 JVr* var x 

1 

Nr* var x 

Between claases (after regression is eliminated) 

; N — r^) var x \ 

g - 2 

AT 

^ q -2 

Residual 

N {1 — ri^) vara; 

N - q 

.r 1 

N -ry vara? 

N -q 

TOTAIiS 

N var X 

N - 1 

1 


This analysis gives us a test of the significance of the correlation coefficient in samples 
from an uncorrelated population and also of linearity of regression. 

In fact, if the parent correlation is zero, the parent value of b is zero and the quotient 
due to b is independent of the sum of the other items in the analysis. Thus the ratio 

(24.47) 

jY (1 — r*) vara? 1 — r* 

is distributed in Fisher’s form with = N — 2, This is equivalent to saying that 


J 


r> (Jf - 2) 


1 _ 

is distributed in ‘‘ Student’s ” form with JV — 2 d.f., which brings us back by a different 
route to the test given in 14.15 (vol. I, p. 342). 

24.24. Secondly, if we assume that the parent correlation is not zero but the regres> 
sion is linear, the sum of squares between classes after regression is eliminated is independent 
of the residual in Table 24.7, and hence the ratio 


AT ^ 

N vara? — 


N vara? 


1 — 


'”g~2 1 - 7^2 


(24.49) 


N -q 

is distributed in Fisher’s form with Vi = q — 2, ^ N — q. This test (due to Fisher 

himself) gives a test of linearity of regression in the normal case. 

It should be noticed that this test is only approximate if the classification is one of 
a normal population with broad groupings. If correlation exists, the distribution of a 
bivariate normal sample in an array of finite width is not exactly normal, being the sum 
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of a number of normal distributions with slightly different means. Unless the grouping 
is very coarse, this is not likely to invalidate tests of significance in practice. 


24.25. Consider now the general regression formula for p variates, — 

a;i = 62 a;, + 6 s + . . . + (24.60> 

If we assume that the residuals ^ 6^ (say x) are distributed normally with 

constant variance, our least-squares estimates of the regression coefficients are those given 
by the usual theory, and the fitting of (p — 1) constants reduces the sum of squares by 
JV” var a? J?*, where R is the multiple correlation coefficient (cf. 15.16, vol. I, p. 380). We 
then have the analysis — 


Sum of Squares. 

d.f. 

Quotient. 

Between classes (regression constants) 

N var X R® 

p - 1 

AT 

, N var X 

p -1 




1 - R* 

Residual 

N V&TX{1 — J5*) 

N - p 

N var X 

N — p 

TOTAIiS 

N vara; 

N - 1 

1 



If the regression is in fact linear of type (24.60), the residual quotient is independent of 
that due to fitting regression constants, and the hypothesis may be tested by means of 
the ratio 


R^ N -p 
p - 1 1 - JS* ‘ 

which is distributed in Fisher’s form with = p — 1, Vz = N — p. 
the distribution of R^ given in 15.20. 


. (24.51) 
This brings us to 


24.26. It is to be observed that in (24.50) we may choose the variates x. . . . Xjy 
as we please. In particular, we can take them to be polynomials of a single variate. From 
this point of view the analysis of variance links up with the theory of regression analysis, 
given in Chapter 22. If the polynomials are orthogonal we can fit the constants 6 one 
at a time, the fitting of any constant leaving unchanged the previous determination of those 
of lower orders. The reduction in sum of squares for each constant can be separately 
ascertained and corresponds to the loss of a further degree of freedom ; and at any stage 
we may test the residual variance to see whether any particular term is worth while in the 
sense that it makes a significant contribution to the total variance. The exact test, of 
course, depends on the usual assumptions of normality. 


24.27. The reader is now in a position to see a number of statistical topics which 
on the surface appear to be distinct as parts of a single theory. Regression analysis, with 
its subsidiary of correlation analysis, proceeds by the successive fitting of constants by 
least-squares. For the normal case this is equivalent to estimation by maximum likelihood. 
Partial and multiple regression, together with curvilinear regression, can all be subsumed 
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under this central idea. The fitting of each constant splits oflF a separate contribution to 
the total variance which, under certain hypotheses, is independent of the others. Variance- 
analysis proceeds in much the same way, but is more general in the sense that it can deal 
with the classification of values, however deterrainedi Our various exact tests of signifi- 
cance of homogeneity in variance, of linearity of regression, of significance of correlations 
in uncorrelated material, of the difference of two means where variances are equal, of the 
correlation ratios, of the multiple correlation coefficient — all derive ultimately from Fisher’s 
distribution of the variance-ratio in the normal case. 


The Analysis of Covariance 

24.28. Suppose that we have a one-way classification, possibly with unequal numbers, 
and that in each class the members present values not of a single variate, such as we have 
considered up to now, but pairs of variate-values typified by j referring as usual 

to class and i to the number within the class. By the ordinary methods of variance-analysis 
we caiv discuss the effect of classification either on the x- variate or on the t/- variate ; but 
there also arises for consideration the effect of class-membership on the covariation of 
X and y. This leads us to an extension of the analysis of variance to that of covariance. 


24.29. By an easy extension of the results for a single variate we have, analogously to 

^ (®y - *..)® = 2 ^ (».^ - *..)® 
i.i i 

the equation in product terms 

iva - y..) = (Va - y.i) + - »..) iy.i - y..) ( 24 . 62 ) 

i,i ij i 

If we consider the whole sample as homogeneous the correlation between x and y is given by 


= __ ^ ~ ~ _ 

7\^(^ii-^..Y^(yii-y:yY 

We have also the correlation between means of classes 

E (xj - xj - y„) 
^/{E(x,i-xJ^^:\y,i-y.y} 
and may calculate a correlation of residuals within classes 

E {Xij - a: J - y,i) 


(24.53) 


. (24.54) 


. (24.55) 


24.30. If there is heterogeneity present we should expect these correlations to differ ; 
and similarly for the three kinds of regression of y on x, such as 


E (Xii - xj iytj - y..) 


. (24.56) 


The three correlations of (24.53)-(24.55) are, however, not additive, like sums of squares ; 
nor are the regressions corresponding. The covariances expressed by (24.52) are additive, 
but there is no simple test, such as exists for variance-ratios, to determine the significance 
of differences or ratios of covariances. Covariance analysis, however, is not primarily 
dpiai g nftH to test independence, but to examine whether there is any variation according 
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to class between the regressions of y on a; within and between classes. Let us suppose 
that there is some linear relation of the form 


T - /ty==fi{X - n^). . 

Following the notation of E. S. Pearson, we write 
^22j = ^ (Vii 

Ci2j = £ (% - x.j) iVij “ y.i) 

i 

C'lla == 

^22a = ^ ^22J 
^12a = ^(^12j 

i 

^um ^ 
i 

Ci 2 m {x^f - xj (Pj - yj 

J 



. (24.67) 


. (24.68) 


. (24.69) 


. (24.60) 


and Cut, Ctttt Cut for the corresponding total sums of squares and products. We may 
then exhibit the composition of the total sums of squares and products in the form of Table 
24.8. The arithmetic of the analysis follows that of ordinary variance-analysis. We 
shall give an example presently. 


TABLE 24.8 


Analysis of Variance and Covariance for One-Way Classification — Sums of Squares and 

Products and Regression Coefficients. 


Variation. 

d.f. 

Sum of Squares, 
x-variate. 

Sum of Squares.: 
2 ^-variate. 

i 

Sum of Products. 

Regression 

Coefficients. 

Within jth. group 

nf - 1 

Ciu 

j 

! 

C22j ; 

Cm ; 

Cm 

Within groups . 

N 

C^lla 

C22a 

G\2a 

t (^I2a 

- Olia 

Between groups 

p — 1 

Ciim ■ 

C22m 

Cli2m 1 

. C\2m 

~ C?ii7n 

Totals . 

N - 1 

1 

C'lio 

i 

C 220 

1 

^7120 1 

j. ^120 


We now suppose that, apart from the regression effects represented by (24.67), the 
variation of a; is normal with constant variance v. We can then compile various estimates 
of V from the residual variation after the effect of fitting regression constants has been 
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removed. • For instance, within classes we have for the estimator of v, with N — 2p degrees 
of freedom, 

~ N — 2p “f 


1 


N — 2p 


8u say. 


The number of degrees of freedom follows from the fact that we have fitted a mean and 
a regression coefficient to each of p classes, making a reduction of 2p in all. We then obtain 
Table 24.9 


TABLE 24.9 

Analysis of Covariance for One-Way Classification wifh Linear Regressions. 


Variation duo to 

d.f. 

Sum of Squares. 

Deviations from linear regressions 
within classes 

N -2p 

^ imj - y.i - bi (xii - x,}))* 

- ^ - bf Ci2}) = 

i 

Differences among regressions . 

p -1 

^ (bj - ba)* (mi - x.iY 

^ ^^(bjOi^j) — baGiia “ 

j 

Deviations within classes from 
linear regression ha ... . 

JV — p — 1 

^ {m - y.} - (mj — *.,)}» 



-- 022a ■“ ha Oi2a ~ 

Deviations between classes from 
linear regression hm, ... . 

p — 2 

^ nj {y.j - y.. - b^ (x.] - »„)}» 

1 

— Oi'Hm — bm Ci2m = Sa 

Differences between ha and h^a 

1 

[ 

^ { (ba - bm) (mj - X.j) 



+ (tn» - bo) (mj - *..)>* 

-(ba-bm)* o„o 

Total deviation from linear regres- 
sion &Q - 

N - 2 

^ {yii - y.. - bo (xij - *,.)}* 

= C 220 — ^0^120 
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The reader will probably find it useful to check the expressions in the third column of 
Table 24.9 and to examine how the sum of squares of deviations from the regression line 
of the whole is analysed into the constituent items. 


24.31. Suppose now that we wish to test whether the relationship between x and y 
can be represented by the formula (24.67), and that there is no material class-effect present. 
Then 8 i of Table 24.9 should be an unbiassed estimator of (N — 2p) v and should be inde- 
pendent of the residual estimator 8 ^ -f 8 , -f /S4, which has 2 p — 2 d.f. We may therefore 
test the hypothesis by the ratio 


81 2 p — 2 

• s. +^+s: 


Vi= N — 2 p, 


Vi = 2 p — 2 . 


. (24.61) 


If this variance ratio is insignificant we consider next whether the regressions differ in 
the p classes. For this purpose we compare the estimator derived from 8 ^ with that based 
on 81 ; i.e. the ratio 


8 i N -2p 

P -1 ■ Si ’ 


Vi — p — 1, Vi — N — 2p 


. (24.62) 


will be significant if differences are to be regarded as real. 

If this ratio is not significant, 8 ^ and 8 i may be pooled. Comparison of their sum 
with 8 i will afford a test whether the relation between group means is linear. The ratio 
for this purpose is 


+ ^* P — 2 

N -p - 1 ■ 8i ’ 


vx = N — p — 1, 


Vi=p — 2 . 


. (24.63) 


Finally, even if this ratio is not significant, it does not follow that the common regression 
within groups is the same as the regression of the means of groups. To test this point 
we consider the ratio 


8i ±_8i 

N -p - i ' 8 i 


Vi = N — p — I, 


Vi = 1. 


. (24.64) 


Example 24.7 

A number of recruits are given a preliminary test to ascertain their suitability for a 
certain course of training. At the end of the training course they undergo a proficiency 
test. The marks for three groups of recruits from three different towns are — 

p (Preliminary : 46, 60, 66, 68, 59, 60, 62, 64, 66, 75 

Ijrroup l^proficiency : 46, 60, 62, 46, 48, 60, 66, 63, 58, 64 

^ /Preliminary : 44, 49, 62, 62, 68, 69, 60, 62, 63, 63, 66, 69, 70, 72, 73 

btroup ^Iproficiency : 48, 66, 46, 60, 66, 64, 69, 71, 77, 70, 76, 80, 72, 76, 81 

p /Preliminary: 47, 62, 69, 60, 63, 66, 68, 69, 74, 76 > 

Uroup -^IproHciency : 43, 66, 61, 72, 60, 61, 66, 74, 72, 80. 

We are interested here in the efficiency of the preliminary test as a predictor of the 
proficiency test. We therefore consider the regression of the marks obtained in the latter 
{y) on those obtained in the former (x). We are, however, also very much interested in 
the .question whether the regressions are the same, apart from purely sampling effects, 
in the three groups. Such a matter would naturally arise, for instance, if we were thinking 
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of applying the same rejection standards in preliminary tests to all recruits, irrespective of 
their town of origin. 

Our scores are given to the nearest unit, and hence the variates are discontinuous. 
We will neglect this effect and assume that the scores are distributed approximately 
normally. 


About origin x — y 

= 50 the 

sums of squares and 

cross-products are : — 


! 

n. i 

rOr). 



i’ (.</*). 

L {xyY 

Group 1 . . . . : 

10 i 

94 

42 

1496 

594 

094 ; 

Group 2 .... 

15 

162 

257 

2802 

6101 

3989 ' 

Group 3 

10 

134 

124 

2550 

2770 

2422 


We can then calculate the quantities C, For instance, 

Cm = 1496 - 94^ = 612*4 
10 

04 

= 694 - 42 -= 299*2 

Cii« ^ Cm Cn 2 ’1“ Cii 3 , etc. 
We find the following table in the form of Table 24.8 : — 


TABLE 24.10 

Analysis of Variance and Covariance for Data of Example 24.7 — Sums of Squares and Products 

and Regressions 


Variation. 


d.f. 


Sum of Squares. 


Sum of Squares. Sum of Products, j 
yK xy, j 


Regressions. 


Within first group 

9 i 


612-4 

„ second group ; 

14 


10.52-4 

„ third group j 

9 j ^\VA 

700-4 

Within groups . . | 

32 { 

Gila 

2425-2 

Between groups . | 

! 

2 

Glia, 

83-05 

Totals . j 

34 

C-no • 

2008-25 



417-6 : 

(1 

1^121 - - 

299-2 

6, -= 

0-4886 


1697-73 ; 

fy 

'-'122 

1213-4 

K 

1-1530 

G ^223 “ 

1238-4 : 

! 

Gi 23 — 

760-4 j 

b, = 

1 -0000 


33.53-73 : 

C\2a 

2273-0 I 

ha - 

0-9372 

C22m =-- 

100501 : 

Gi*j,h - 

118-,57 1 

l>m ~ 

1*4270 

n 

'>220 

4358-74 

G ’ i 25 » ~ 

2391-57 

ft.- 

0-9535 


A comparison of the three regressions within groups indicates some heterogeneity. 
It looks as if the preliminary test is not such a good predictor for the first group as for 
the others. We may proceed to test the reality of this effect by constructing Table 24.1 1 
on the lines of Table 24.9. For instance, 

Sx = E ^ (417*6 - 299*2 x 0*4886) + (two similar terms) 


A.S. — VOL. II. 


1048*1. 


R 
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We find— TABLE 24.11 


Analysis of Covariance of Dala of Example 24.7 — Linear Begressions. 


1 

Variation. 

d.f. 

Sums S. 

Quotient. 

Deviations from regressions hj ... 

29 

Si = 10481 

36-1 

Differences bj 

2 

= 176-4 

87-7 

Deviations from ba 

31 

+ iSfa - 1223-6 

39-6 

Deviations of groups from bm ... 

1 

^3 = 836-6 

835-6 

Difference between ba and bm ... 

1 1 

= 19-3 

19-3 

Totals 

i 

1 

! 33 

+ 'Sf, + - 2078-4 



A comparison of the quotient 36*1 (29 d.f.) with the quotient of the remaining items,. 
267*6 (4 d.f.) indicates that there are real diflFerences between classes. A single regression 
equation will not represent all three class-relations. A comparison of the deviations from 
regressions, 36*1 (29 d.f.), with the differences of regressions among themselves, 87-7 
(2 d.f.), does not reject the hypothesis of equality of regressions within groups. We there-^ 
fore compare the deviations from 6^, 39*5 (31 d.f.), with the deviations of groups from 
836*6 (1 d.f.). This is significant, suggesting that the hypothesis of linearity of regression 
of group-means should be rejected. 

The general result is to confirm our suspicion of heterogeneity. The correlation 
coefficients between x and y are — 


Within first group 




. 0-592 

„ second group . 




. 0-908 

„ third group 




. 0-784 

Within groups . 




. 0-797 

Between groups . 




. 0-410 

Total 



, 

. 0-722 


Again the deviations between groups stand out as indicating heterogeneity. 

24*32. The analysis of covariance may be extended to the case where there is more 
than one independent variate. The regression coefficients are found in the usual way, 
and the sums of squares after regressions have been removed can be found and compared 
on the usual hypotheses. Suppose, for instance, there are two independent variates and 
a classification giving an analysis between classes and residual. We may represent the 
analysis thus : — 



d.f. 

Sum of Squares. 

Sum of Products. 

Between classes 

xl 


y 



yxz 

n 

A 

B 

C 

P 

Q 

R 

Residual 

n' 

A' 

B' 

C' 

P' 

Q' 

B' 

Totals . . . 


A'' 

B'^ 

C'' 

1 

P^ 

Q" 

B'' 
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Our regressions are then — 


I Between classes 

I 

I Residual 


Totals . 

i 


hx 

bt 

BQ - PB 

AR -PQ 

AB - P» 

AB - pa 

B^Q' - P'jR' 

A'B' - P'Q' 

AB' - 

A'B' - P'a 

B^Q" - P^P" 

A"B" - P"Q" 

AB" - P^» 

A"B" - P^a 


The sums of squares C can then be reduced by eliminating regressions, i.e. by subtracting 
Qb^ + giving 

^ -PQR _ AR^ - PQR 

AB^P^ AB-P^ 


ABC - AR‘^ - BQ^ GP^ + 2PQR 
AB^P^ 


. (24.65) 


This and the analogous quantities with primes give independent estimators of the 
variance of the residual element, and a comparison to test homogeneity may be made in 
the usual way. 


24.33. In a case such as that of Example 24.7 it is evident that a comparison of 
2 /-means between groups is affected by what we know about the ar-values. If we know nothing 
about the latter, comparison of the y’s is a univariate problem and can be treated by the 
methods already discussed, the difference of means, for example, being tested by the use 
of standard errors or the ^-test. But suppose that our a;’s themselves are found to be dif- 
ferent between groups and that there is significant correlation between x and y. Then 
it is possible that the relation, if any, between j/’s in different groups is not, so to speak, 
an inherent quality of the variation of y, but is merely a reflection of their dependence on 
the a;’s, which happen to exhibit significant differences. In Example 24.7, differences in 
proficiency between groups may be due simply to differences of ability which were present 
before the training began and, if so, should be shown by differences between groups in the 
preliminary scores. We should not then be able to conclude from proficiency scores alone 
that training in one group had a more marked effect than in another. The differences 
were there before the training was applied. 

24.34. If, then, we require to consider the effects of training alone on the groups, 
we may “ correct ” the ?/- values by deducting the estimates 

== y.. + *0 {Xij - « .) . . . . . (24.66) 

or other more general regression equations. This, so to speak, allows for differences due 
to variations of the a;-variate. 
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Assuming that one linear regression equation adequately describes the relationship 
between y and x, so that the corrected values are 

Vii - Vii - y.. - ba (% - .... (24.67) 

we see that the difference of the corrected means of two classes and y^^^ is 

y.i - y.k - *0 - a:.*). .... (24.68) 

This may be regarded as the sum of two parts which are independent. The estimated 
variance of the first part, y^j — y^j^^ q' where is the mean-square of the residual after 
correcting for regression and the means of y ^ and .v are both based on q members. Sinii- 
larly the variance of b is where A is the sum of squares of the a;-variate entering into 

the residual row of the analysis. Regarding the a:’s as fixed from sample to sample, so 
that our inference is conditional, we see that the variance of the difference (24.68) is given by 

a* .... (24.69) 

The ratio of the difference to the square root of this expression is distributed as Student’s ” 
with degrees of freedom one fewer in number than those of the original residual. 


24.35. Similarly, if we have two independent variables Xi and x^^ the corrected 
difference of y-means is 


y.j - y.k - (Xif - a;,*) + b.j, - * 2 *) } . . . (24.70) 

where temporarily we write for the mean of the variate Xi in the jth class, and so on. 
The variance of the part in curly brackets may be derived by considering the variance of 
the general expression + /ibf From the equations for 6i and 6s we have 


6x 

.6s 


B E iyxi) —PE {yxt) 'j 
- AB-P* 

— PE (y*,) + AE (yxi) 
AB-P^ ” . 


(24.71) 


where, as in 24.32, A and B are the sums of squares for Xi, x^, and P is the cross-product. 
Thus the coefficient of any y in A6i yb^ is 

(AjB — yP) Xi -f {yA — XP) X 2 
~AB-~P^~ 


Since the y’s are independent the estimated variance of Xbi + ybx is 


(AB - p»y 


{A(AB - yP)» + 2P(XB - yP) (yA -XP)JrB(yA- XP)* } 


_ A* jB — 2A^P y^ A ^ 
AB -P^ * ■ 


(24.72) 


Thus for the estimated variance of the corrected difference (24.70) we have 


f 2 A»S - 2XyP +y*A\ 
Is ■" AB - P* J 


. (24.73) 


where A = Xij — * 1 * and y = x,, — x^. As usual, the difference divided by the square 
root of this quantity may be tested in the f-distribution. 
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24 . 36 . Our account of the analysis of variance and covariance has not attempted 
to cover all the applications of the method in particular directions. We have concentrated 
so far as possible on the fundamental ideas and the broad lines of analysis to which they 
lead. Some further developments will be given in later chapters, but we must refer the 
reader who requires a complete acquaintance with the subject to the references given at 
the end of this chapter and the preceding. We will conclude our exposition with three 
final comments. 

(a) Part of our hypothesis throughout has been that the residual element C has constant 
variance from one subclass to another. In Chaptc^r 25 we shall discuss methods of testing 
homogeneity in residual variance. For completeness we might perhaps have anticipated 
some of these tests in the present chapter, at least to the extent of exemplifying theiV* use. 
We have not done so mainly for reasons of economy in space ; but the omission of mention 
of the point in foregoing examples should not lead the reader to overlook (as many writers 
do overlook) the necessity for testing variance-homogeneity where possible, if it is required 
as part of the hypothesis. 

(5) In the majority of oiir examples we have proceeded at once to analyses of variance 
or covariance without dwelling on points which would recpiire attention in any practical 
inquiry. For instance, since the primary function of many variance-analyses is to test 
the homogeneity of a set of class-means, the first stage would be to compute those means 
and examine whether they suggest any lack of homogeneity on intuitive grounds. Again, 
if heterogeneity is established, consideration of the means themselves, or of the primary 
data, will sometimes show how it arises. The student must never lose sight of his primary 
material. 

(c) Elaborating this point to some extent, we would emphasise that the analysis of 
variance, like other statistical techniques, is not a mill which will grind out results auto- 
matically w ithout care oi* forethought on the part of the operator. It is a rather delicate 
instrument which (^ari be called into [)lav when precision is needed, but requires skill as 
well as enthusiasm to a])ply to tlu^ best advantage. The reader who roves among the 
literature of the subject will sometimes find elaborate analyses applied to data in order to 
prove something which was almost obvious from careful inspe(;tion right from the start ; 
or he will find results stated without qualification as significant without any attempt 
at critical ai)preciation. This is not the occasion to deliver a homily on the necessity for 
self-discipline in the use of advanced theoretical techniques, but the analysis of variance 
would provide quite a good text for a discourse on that interesting subject. 


NOTES AND REFERENCES 

For the analysis of variance where subclass frequencies are unequal, see Brandt (19:i3) 
and an important paper by Yates (1934a). Wilks (1038e) has considered the subject from 
the theoretical viewpoint and exhibited the main results determ inantally. For the missing 
plot technique see Allan and Wishart (1930) and Yates (19336). For the analysis of 
covariance see Fisher’s Statistical Methods, Bartlett (1934a), an appendix by E. S. Pearson 
to a paper by Wilsdon (1934), Brady (1935), Wishart (1936), and Day and Fisher (1937). 
The last-mentioned paper works through a practical example in some detail and will 
repay study. 

See also references to the previous chapter. 
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EXERCISES 

24.1. For a two>way classification with one member in each sabclass show that, 
for normal variation, 

(«/. - * .) {x,k - xj = 0, 

and hence that the sums £ {xa —x )* and E(xj^ — x are independent. Examine 

i ’ ’ 

how this breaks down for the non-orthogonal case. 

24.2. Verify the arithmetic of Example 24.6. 

24.3. Generalise formula (24.73) in the following way. If there are m independent 
variates, the variance of corrected differences is 

r,«-=l 

where /, = — x^j^, and where A„ is the cofactor of in the determinant 

I a„ I, and = Ex^Xg summed over the sample. 

(Wishart, 1936.) 

24«4« - Derive by the analysis of variance the test of a regression coefficient given 

in 22.19. 




CHAPTER 26 

THE DESIGN OF SAMPLING INQUIRIES 

Influence of Theory on Sampling Design 

25.1 • The reader who is accustomed to handling the results of a sampling investigation 
as they appear in everyday statistical work may have wondered more than once in previous 
chapters whether theory was not reaching out too far in advance of practice. It is true 
that for certain types of experimental inquiry, notably in agricultural and biological research, 
the precision of exact statistical tests does not seem out of place ; but in economic or social 
statistics, for example, there is often so much error and imperfection in the raw data that 
the application of refined methods of analysis would be a waste of time. It is clearly 
useless, and may even be dangerous, to exercise an elaborate mathematical technique on 
data which are suspect from the very start of the inquiry. If our theory is to be really 
serviceable to the statistician and not merely an enticing mental exercise it must be capable 
of solving practical problems. 

25.2. Now it has to be admitted that much of the rjiaterial with which statisticians 
have to work at the present day cannot be treated by the methods expounded in the fore- 
going pages when sampling questions are concerned. The commonest reason, but by no 
means the only one, is that the sampling process by which the data were obtained was 
biassed. In such cases the statistician has to lay aside the refined implements of his craft 
and do the best he can with his refractory maWial in the light of his own judgment and 
comhionsense. A good deal of current statistical work is of this kind, and there is even 
a section of thought which is inclined to depreciate the advanced theory of the subject as 
“ academic ” in the sense that it is too remote from practical affairs to be worth studying. 
The misunderstanding is not likely to be removed by the counter-accusation sometimes 
launched by theoreticians that the theory is quite capable of being applied by anyone who 
has the ability to comprehend it. 

25.3. Fortunately there is a growing realisation that the two points of view can 
often be reconciled by collecting the data in such a form that the theory can be applied to 
it. If only enough care is taken at the initial stages of an inquiry there is no need for the 
appearance of imperfect data which defy exact analysis. Knowing beforehand what 
theoretical instruments are at our disposal, and armed with a clear understanding of what 
questions we are trying to answer, we can frequently frame the investigation so as to maxi- 
mise the information acquired with the minimum of effort. In short, the scope and nature 
of our theory itself dictates, to some extent, the form which the sampling inquiry should 
assume. In former times the statistician weis usually asked to extract information from 
data which were collected by inexpert agents, frequently for quite different purposes. 
Nowadays he is still in the same position in some respects, but sometimes he is called in to 
advise on the design of the inquiry and can, within limits, determine the form which the 
data are collected. |f"He can make his theory applicable by selecting his sample in the 
proper way. | 

25.4. The general theory of the design of sampling inquiries has not progressed far 
enough for us to be able to give a systematic account of it in this chapter. In some fields, 
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particularly that of agricultural experimentation, it has reached quite an advanced degree 
of perfection ; in others there remain many problems unsolved and possibly many more 
which have not yet even been formulated. At the risk of some discontinuity of treatment, 
therefore, we shall only give in this chapter a number of instances in which theoretical con- 
siderations exert a considerable effect on the scope of a sampling inquiry, in order to illus- 
trate the field to be covered. There are, of course, many factors which ultimately deter- 
mine the form of an investigation, such as cost and expenditure of time, but they will 
not concern us here. • For the present we shall be concerned solely with the extent to which 
theoretical considerations contribute to all the factors that have to be taken into account 
when an inquiry is designed. 

/Some Preliminary Points 

25.5. There are certain preliminary points which, though obvious enough when stated 
explicitly, are often overlooked and cause a good deal of bad design. 

(а) The fundamental object of sampling is to obtain information about a population, 
and it is of the first importance to begin with a clear idea of what that p opnintion. 
is.. Imagine, for instance, that we are asked to ascertain whether pasteurised milk has 
a different feeding value from raw milk. In what population is this inquiry to be made : 
among children ? among the inhabitants of the British Isles ? among those who habitually 
drink milk or those who do not ? among townspeople or among country folk ? and so 
on. Again, suppose that we are given a new variety of barley and wish to know whether 
it has a heavier yield than a previously known type. Do we mean heavier in the usual 
barley-growing areas ? in every kind of climate or on the average over a series of different 
climatic conditions ? when subject to the same manurial treatments as those in current 
use ? and so on. 

(б) In a similar way, it is necessary to have an equally clear idea of what we are trying 
to fincLout about the population. In our example of raw and pasteurised milk, are we 
content to know that there is (or is not) a differential effect for children as a whole ? or do 
we wish to ascertain whether any such effect varies at different ages, between sexes, or 
according to nutritional standards ? What exactl}" should we like to know ? It is no use 
returning the facile reply “ all about it ” to this query, for our informatibn must be limited 
in virtuq of the finite size of our sample. We must make up our minds what information 
we require and wHich questions have priority if it becomes necessary to sacrifice some of 
them for practical reasons. 

(c) Thirdly, we should consider what we know already abo ut our population. This 
point becomes of particular importance when our prior knowledge indicates h eteroge neity, 
for then we may, in effect, have to divide the population into sub-groups and sample separ- 
ately from each. In our milk example, it is to be expected that children of different ages 
may react differently, or that children from lower-class schools may respond differently 
from those in middle-class schools. Or again, in our barley example, the two varieties 
may compare quite differently on Hertfordshire loam and on Lincolnshire chalk. It would 
be misleading to lump all the comparisons together when we have strong reason to suspect 
heterogenet|y beforehand. In effect, prior knowledge of this kind frequently dictates the 
types of question we ask under (6), and the two are often different facets of the same problem. 

(d) As an extension of the same point, we may notice that prior knowledge aboj^t the 
population sometimes indicates what sort of averages to use and what sort of tAts of 
signifieanee it is proper to apply. Crop-yields, for instance, ate known to be disti^uted 
in ,a form approaching the hormal, so that arithmetic means are good estimates parent 
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means and the tests based on normal theory may be applied. Accident statistics, on the 
other hand, are often distributed in a modified Poisson form ; income statistics in a J-shaped 
form, and so forth. 

(e) A specification of the population and a decision as to the precise object of the 
inquiry will usually determine certain parameters which it is required to estimate or certain 
hypotheses for test. In general the problem is one of estimation, but not necessarily so. 
In our case of pasteurised and raw milk, for instance, we should probably wish to know 
the exact amount of the difterence between the effects of the two (a matter of e8timation)i^ 
not merely whether a difference existed (a matter of significance). We then wish to know,"* 
before the inquiry begins, whether the estimates we shall have are going to be accurate 
enough for our purpose ; or alternatively, if the sample is of a given size, how accurate they 
will be. It may not always be possible to answer such a question completely beforehand, 
since the sampling variances will in general dej)eiKl on quantities which have to be estimated 
when the data are available, but it is always useful to consider in a general way what sort 
of magnitudes would be shown as significant and what values would leave us still in reason- 
able doubt. As a rule, matters such as this are closely related to sample size. 

(/) Finally, our estimates will be subject to experimental error and, in development 
of the last point, we have to try to find the form of experimental design which, while answer- 
ing our questions, does so with the minimum error. From H slightly different standpoint, 
if we can determine the amount of error which is admissible, the problem is to find the 
design which achieves no more than that error with the minimum expenditure of effort. 
Furthermore, we re({uire to be able to estimate the extent of probable errors. In short, we 
require an efficient design, just as the engineer requires an efficient engine or the aircraft 
designer an efficient form of airscrew, and for exactly the same reasons. 

25.6. To sum up, our primary task in embarking on a sampling inquiry is to ascertain 
as accurately as possible what is the population under examination, and what is the informa- 
tion about it which we require. If, as usually is the case, that information concerns statis- 
tical characteristics such as means and variances, or more generally frequency-distributions, 
our second task is to design an inquiry which will provide estimates of these unknown 
quantities and will, at the same time, provide estimates of their sampling error. It is not 
always possible, as we shall see later, to obtain full satisfaction in the reduction of error 
and the estimation of error simultaneously. Uncreased accuracy of estimation may mean 
loss of precision in our estimate of sampling error, so that although we are nearer the truth 
we do not know how near.) There does not appear to be any single rule which will cover 
all the cases that can arise. We shall refer to a particular case of some interest in 25.39» 

Stratified Sampling 

25.7. We consider at the outset a case of fairly frequent occurrence in the sampling 
of existent populations. Suppose we are interested in the mean value of a vari^/te x in 
some population 77 ; and that we know, or suspect, that the population is heterogeneous 
in the sense that we can delimit sub-populations 77i, TZg, ... 77^ in which the distributions 
according to x may differ. This type of case might, for example, arise if we were sampling 
the population of a town for income, there being districts, wards or even streets which are 
known.j?j|K> be inhabited by classes living at different income-levels. 

Pri^t^l considerations alone may require that we draw a prescribed portion of the 
sample'^^'from each sub-population. For instance, with a town of 500,000 inhabitants it 
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would be most tedious to sample by using random numbers applied to the whole town. 
We should probably divide the work among districts and blocks and select random samples 
within the blocks. This, however, is not to be confus^ with the division of the town into 
relatively homogeneous districts because of its heterogeneity. Either process is called 
stratification. The problem we shall discuss is this : If we have decided to draw a total 
sample of n members, and can qissign at will the number drawn from the »th stratum 
subject to the condition S (%) = n, how should we choose the numbers % or need we 
choose them at all 7 * Will our estimate of the mean value of a; be better if we merely choose 
n members at random from i7, or can we improve it by controlling the numbers % and not 
merely leaving them to chance 7 j 

25.8. Let Xif be the jth member of the sample from the tth sub-population, and let 
the latter contain a number Nf of members with mean ju^ and variance of. If is the 
mean of 77 we shall have 


(26.1) 

We shall now seek for parameters such that our estimator of /n, say t, is given by 

fe »< 

(Ay ^ij)> ..... (26.2) 

that is to say, is a linear estimator in the observed variate-values. We shall seek for that 
estimator which is unbiassed and has minimum variance, i.e. for which 

E {t) <= ft . . . . . . . • (26.3) 

E {t — E {t) }* = minimum. .... (26.4) 

Substituting /from (25.2) and (26.1) in (26.3), we find 

E I ^ A«^*y I = 
and since E (Xy) = /if this gives 

(26.6) 

For this to be generally true we must have 

a first condition on the A’s. If A^. is the mean of Afy in the ith set we have 



Now consider (26.4). The variance of t is the sum of k variances, for the samples from 
sub-populations are independent. Consider then the variance of j^AyXy, r^embering 
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that the population of members is finite. We have 
variance = E I — /f<) }* 

~ {E Xij {Xi! — fif) {Xfii — /if) }, j 

^ i.k 

_ Oj Nj Z ^a\ (Z kjj)^ 

Ni- 1' Ni - T 

= jfrZTi ““ ■" • • • (25.8) 

This is clearly minimised only if • 

= 0, . . . . . . (25.9) 

that is, if all the A’s for any sub-population are equal. This is what we should expect on 
intuitive grounds, for there is no reason for weighting the sample members differently in 
the same sub-sample. 

Our minimal variance, say v, is then given from (25.8), by summing over i, as 

N,-l 

_±r^UNi-nf)N\ 


N^l Nf-l 




1 

This is a minimum for variations in subject to Z" » if 

d 


(25.10) 


dtii 


{v —pErif) = 0, 


where p is an undetermined constant. This yields almost at once 


oc 


Nf-1 


(26.11) 


25.9. If we know the population variances af and the numbers Nf this equation 
determines the numbers ; but in practice it is rather unlikely that we should know the 
variances without knowing the means, in which case we should not have to sample to find 
the mean of the whole population. Our result is not, however, useless. In the first place 
wre find for the estimator t 



= Z ..... (26.12) 

so that the estimate is a weighted average of the sample means, the weights l^ing propor- 
tional to the population numbers Nf, not to the numbers rif. Secondly, without knowing 
the variances a\ exactly, we may sometimes reach approximations from prior knowledge 
of the populations. Such values, without giving absolute accuracy, will at least represent 
improvements on selecting the n’s by chance. 
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25.10. If the numbers are effectively infinite the formulae simplify, and, for 
instance, instead of (25.11) we have 

QC ^i9 (25.13y 

the sample number varying with the standard deviation in the stratum concerned, as well 
as its number of members. 

25.11. If there is no information available at all about the variances the most 
reasonable course in applying (25.11) appears to be to suppose them all equal. In such 
a case, for large we have 

^ oc . . . . . . (26.14) 

or the sampling numbers are proportional to the population numbers. This is what we 
might expect on intuitive grounds. If the populations are infinite the n^’s are equal, which 
again is in accordance with intuitive ideas. 

25.12. The above will serve as an illustration of the way in which theoretical require- 
ments can influence the scope of an inquiry conducted among an existent population. By 
seeking for an estimator with minimum variance we have been led to expressions deter- 
mining the allocation of sample numbers among the different strata — and incidentally, of 
course, we have derived expressions for the minimum variance, so that the maximum 
possible precision can be ascertained. The fact that some of our results depend on unknown 
constants suggests that in some circumstances it may be worth while conducting a pre- 
liminary or ‘‘ pilot ” inquiry in order to estimate the unknowns and hence to improve the 
precision of the main inquiry which is to follow. The possibilities of such pilot surveys 
have yet to be explored, but the technique appears to merit serious investigation. 

25.13. In passing, we may mention one other topic of great practical importance on 
which theory can throw a good deal of light, that of optimum size of a sampling unit. In 
sampling a human population of a town, for instance, need we take individuals as our 
units ? It would be easier to sample households, or streets, or even whole districts ; but 
do we lose anything by this method, and if so, how much ? Furthermore, the grouping of 
individuals into units of larger size sometimes has a peculiar effect on correlations which 
may lead to erroneous conclusions, and a theoretical investigation may be required to safe- 
guard against error. We shall not pursue the subject further here — the sampling problem 
would require a book in itself— but the reader who is interested may like to consult some 
of the papers referred to at the end of the chapter. 

The Design of Experiments 

25.14. For an existent population the flexibility of sampling technique is somewhat 
limited. We are given an aggregate of values, some of which are to be extracted for scrutiny, 
and no manipulation of the sampling can tell us more than exists, so to speak, already 
inscribed upon the population itself. Consequently the main line of endeavour in such 
cases lies in estimating with the greatest accuracy (which is largely a matter of choosing 
the right statistics and minimising sampling variability), or in ensuring that sufficient 
material is available to enable the requisite comparisons to be made with significance 
(which is.largely a matter of sample size and selecting the most suitable tests of significance^ 
Nothing can alter the population, and theory will, as a rule, only react upon the sampling 
process by some such method as has already been exemplified, e.g. in dictating that the 
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'Sampling must be random, in stratifying the population before the sampling is carried out, 
and in deciding how limited resources can be expended to the best advantage. 

25.15. For hypothetical populations there are often wider possibilities, for the nature 
of the inquiry may itself determine which populations are to be studied, and the populations 
may, to a certain extent, be set up at will. For instance, if we are interested in an inquiry 
into the relationship between income and size of family in the United Kingdom, the popula- 
tion already exists and we cannot go outside it ; whereas if we wish to discuss the effect 
of a poison on bacterial growth or of a fertiliser on the yield of barley we can not only 
reproduce experimental data od libitum but can arrange the inquiry so as to confine it to 
certain populations (e.g., by considering only a given type of bacterium in fixed nutritional 
circumstances or at fixed temperatures), or we may extend the domain of consideration as 
far as purely practical limitations will allow (e.g., by growing barley in new surroundings 
or in new climates). ^This is rather a pretentious way of saying that we may experiment 
in a domain which, within limits, can be assigned at will.\ The statistician has a much 
greater scope for ingenuity in the design of experiments tnan in the design of sampling 
inquiries on existent populations because of the greater degree of control over the population 
under examination. 

25.16. In the classical ideal experiment, only the factors under consideration were 
allowed to vary, other conditions being kept as constant as laboratory practice would allow 
— in investigations concerning the relation between resistance and current in an electric 
circuit, for instance, attempts would be made to keep factors such as temperature and 
external magnetic effects strictly constant. It would be recognized that there would be 
residual errors which would affect the exactitude of the results, but these would be measur- 
able on certain assumptions. 

25.17. Statistical theory can, of course, deal with such cases, but it can also go farther 
and often wishes to do so. In the first place, it frankly admits the existence not only of 
experimental error (in the sense of aberration from a “ true ” value) but of the much wider 
type of variation which gives rise to frequency-distributions in practice. Instead of isolating 
particular factors for study, it may wish to give full play to the disturbances which arise 
in practice in order to investigate what happens in “ natural ” conditions. fFor this reason, 
statistical experiments are often complex in the sense that a number of factors are allowed 
to vary simultaneously. / 

Secondly, the admission of outside influences which together make up what is generally 
called experimental error implies that it should be possible to estimate the extent of such 
error from the data themselvesj We wish to obtain, not the functional relations between 
variables which may only exist under artificial condition^ but the stochastic relations 
observed in practice. 

25.18. The effect of this on experimental design is that the hypothetical population 
we consider is often a rather general one. Taking the case of trials of a new variety of 
barley as an example, we should wish to compare its yields with those of other varieties 
in different soil conditions, with different manurial treatments, in different years (so as to 
get variations, in climate), and so on. (Furthermore, to obtain estimates of the error due 
to other factors we usually have to replicate the experiment.) A great number of inter- 
comparisoiis fall to be made, and the process of design is essentially that of finding a form 
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of experiment which will permit all these comparisons and yet save as much unnecessary 
labour as possible. 

0rth4)gonality 

25 . 19 . To reduce the discussion to more concrete terms we will consider the testing 
of a new variety of barley. In order to study its behaviour under different soil conditions 
we will select a number of areas in which barley is grown and choose a block of ground in 
each. This will give us inter-soil comparisons. We will also arrange to carry the experi- 
ment on for a period of years, so that climatic variations may also be compared. The 
other factor in which we are interested is the response to certain manures, which we will 
take to be dung (D), potash (K), nitrogen (N)y and phosphates (P). 

Consider any block at any one place in any year. (We will decide on certain standard 
quantities of the four manures and assume that for any manure either a dressing of this 
standard amount is to be given, or it is to be withheld.^ This simplifies the experiment, 
for then every manure either is or is not applied, and our results can be classified by simple 
dichotomies. Of course more complicated experiments can be devised to allow for different 
quantities of fertiliser, but the simpler case will be sufficient for our purposes. 

We have then set up a population which can be classified according to six qualities, 
place, time, and the application of four manures. Our results are intended to show whether 
there is any variation in yield between these conditions and various combinations of them. 
Of course, it does not follow in deductive logic that if there is significant variation from year 
to year in the particular years chosen there will always be temporal or climatic variation ; 
and similarly, if there is significant variation from place to place it does not follow that 
other soil conditions which have not been tested will show a significant variation. To 
arrive at such conclusions we have to perform an ordinary generalisation by induction. 
What we shall say, if significant results appear, is that in the regions tested, or for the years 
tested, there were significant variations, and that it therefore appears likely that soil and 
climate exert a material effect on yield — and we shall maintain this with more or less con- 
fidence according as our experience is wider or narrower. This is the familiar inductive 
inference which forms the basis of all scientific inquiry. 

25 . 20 . Within any one block we shall wish to study the effect of manurial treatments 
not only separately but in combination. We therefore divide the block into sixteen com- 
partments and treat them, respectively, with no manure, D, iC, N, P, DK, DNy DP, KN, 
KP, NP, KNP, DNPy DKP, DKN and DKNP. Here every possible combination appears 
once and only once. To compare, for instance, the mean yields in the presence or absence 
of dung we add all the eight yields for plots on which no dung was spread and compare it 
with the sum of the other eight. All the necessary comparisons can be made. 

(^Data of this kind are said to be orthogonal^ Each possibility arises an equal number of 
times. The reason for the use of the word is tnat such material is orthogonal in the sense 
we have considered in the analysis of variance. We saw in Chapters 23 and 24 that where 
cell-frequencies were equal the analysis was greatly simplified, and that und^r the custom- 
ary hypotheses the estimates of means were independent. ^ It is not, of course, absolutely 
necessary to have orthogonal data — ^in fact, we have shown in Chapter 24 how to deal with 
the non-orthogonal case ; but it is evidently a great convenience to be able to arrange 
for o^hogonality, and no efficiency is lost by doing so. j 
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BepUcaUon 

25.21 . 'll) as suggested above, we divide ep.ch block into 16 plots and treat each differ- 
ently, the analysis of variance of any block will have 16 degrees of freedom ; and if we 
cannot ignore any of the interactions there will be no residual variance due to “ error ”, 
that is to say we cannot estimate the reliability of our comparisons. All the 15 possible 
independent comparisons may be made, but we cannot decide whether differences are 
significant in the sense that they may be due to the other factors which we have agreed 
to allow to bear on the experiment, such as individual soil differences from plot to plot. 
If we are to estimate such “ error ” we must give the factors which produce it an oppor- 
tunity of varying. This may be done by replicating the experiment, that is to say, by 
repeating it in the same form. For instance, suppose that we set up four blocks and divide 
each into 16 plots, applying our manurial treatments to each block. Ij'hen, assuming that 
there are no significant interactions between blocks and tr eatments (a matter which we 
can test by examining the interaction terms in the variance-analysis), we shall have 63 
degrees of freedom, of which 15 are assignable to treatments and their interactions and the 
remaining 48 to a “ residual ” term, the latter providing an estimate of experimental errorr 
We have exemplified this process in Chapter 23. 


Randomisation 

25.22. to this point we have said nothing about the arrangement of our 16 plots 
within the block. Suppose we divide our block into plots of equal size. Is there any 
advantage in allocating the treatments systematically, or is it preferable to assign them 
at random ? 

We shall consider the relative merits of random and systematic arrangements in more 
detail below, but we can announce the general rule now : /unless there is some good reason 
to the contrary, it is bett^ to allot the treatments at random. Where possible, chance 
should be given full play/ 

25.23. trhe justification for this rule in our present instance can be seen by reference 
to the section on randomised blocks in 23.41. We saw there that by randomising the 
allocation of plots we were able to preserve the ^-distribution and hence to validate our 
tests of significance, even where normality in the parent form was not assumed. / The 
process is essentially one of extending our hypothetical population. Instead of considering 
the observed yields as specimens of what might happen in repeated trials of the same variety 
of barley if the same manurial treatments were applied to the same plots, we consider the 
possible yields in repeated trials if the manurial treatments were applied in all possible 
ways to different plots, j Our experiment is systematic in the sense that we prescribe a 
different treatment for each plot ; it is random to the extent that we allot the treatments 
to plots by chancy 

25.24. There is one source of possible confusion here which it is desirable to remove. 
In our agricultural example complications arise because of the physical contiguity of the 
plotsjand we shall see below that it is often desirable to eliminate by special designs system- 
atic mrtility gradients in the soil. ) In other classes of experiment where we desire orthogon- 
ality, the members need not be subject to this kind of effect, and often are not. Reverting 
to the example of raw versus pasteurised milk which has already been mentioned, suppose 
we take a simplified case and wish to measure whether the two different milks have different 
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effects on boys and girls. With a class of 40 children, 20 boys and 20 girls, we can proceed 
in several ways. It is obviously useless jto^give raw milk to all the boys and pasteurised 
milk to all the girls, for then we have no measure of the differential effect, if any, for either 
sex alone. ’We might toss up in each case and allot raw or pasteurised milk to each child 
by chance ; but this would probably make the data non-orthogonalJ To attain orthogon- 
ality, we should allot 10 children to each of the four sub-groups JSP, OP, BB, OR (where 
B = boy, O == girl, P = pasteurised, R = raw). We then have an analysis of variance — 


Degrees of freedom 

Between sexes ......... 1 

Between milks ......... 1 

Residual (including interactions) . . . . .37 

Total . . . . . . *. .39 


This is analogous to a test of a cereal with two fertilisers and 10 replications. 

The question is, how should we allot the children to the four groups ? Their sex, of 
course, is determined, but the nature of the milk they receive is at choice. It is here that 
the randomisation will help. The ten children of a specified sex who receive raw milk 
should be chosen at random from the 20 available. In this instance it might be thought 
that any method would do ; but it is best to avoid the risk of bias, jif the children were 
chosen by the teacher he might tend to select the 10 bigger boys or the 10 brighter boys. 
If they were chosen alphabetically, we might get brothers and sisters automatically receiv- 
ing the same treatment ; and so on. The randomisation process avoids all systematic 
effects of this kind and brings us a stage nearer to obtaining an unbiassed answer to our 

Sensitivity of a Test 

25.25. In some cases, where the variate is discontinuous, the nature of the test of 
significance which we propose to apply may make a difference to the form of the experiment. 
If we are testing a certain hypothesis which can produce a specified number m of experi- 
mental results which are acceptable as conforming to the hypothesis, whereas other 
hypotheses produce a number n of other results, we clearly want to keep m as small as 
possible compared with n. The ideal case, of course, is that of the crucial ” experiment 
in which the hypothesis can only give one result and other hypotheses give a different 
result. The result then proves or disproves the truth of the hypothesis, and no test of 
significance arises. In statistical practice we do not as a general rule perform crucial 
experiments, but we can sometimes design an experiment so that it is more crucial, if the 
expression be allowed, than alternative methods. 

25.26. Consider, for instance, the case of a cashier who claims to be able to detect 

good money from false at a glance. To test this ability we spread ten coins before him, 
tell him that p are good, and ask him to point them out. What number of good coins p 
should we include among the ten ? * 

If the cashier had no power of discrimination and there are p good coii^, the proba- 
bility that he would guess right by chance is 

10 \ 
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for the total number of ways of selecting p from 10 is the denominator of this fraction and 
only one of them is right. Now we want to choose p so as to minimise the probability of 

such an event, i.e. so as to maximise ^ ^ ^ This is clearly done when = 6, so that we 

ought to have five good and five bad coins in the set. Any other number would increase 
the probability that he might be right by chance and hence decrease the sensitivity of the 
experiment. ^ 


Laiin Squares 

25 •27. We now proceed to consider a different type of design, which has been freely 
applied in agriculture but may also be applied to other forms of inquiry. Suppose we 
have a variety of barley to test and five different treatments to apply. We will assume 
that replication has been considered necessary and will replicate five times, the same number 
as the treatments. We will then divide our block into 25 plots like a chessboard (though 
the plots may be rectangular and need not be exact squares, provided they are all the same 
size). Each row may be considered a replication of the five treatments, and this itself 
involves the appearance of each treatment once and only once in each row. Can we extend 
the arrangement and ensure that in addition the treatments will occur just once in each 
column ? 

The answer is affirmative, as the following example shows : — 


A 

B 

G 

D 

E 

B 

C 

A 

E 

D 

C 

E 

D 

A 

B 

D 

A 

E 

B 

C 

E 

D 

B 

C 

A 


. (25.15) 


An arrangement of this kind is called a “ Latin square It was studied extensively by 
Euler in the eighteenth century, though not of course from the statistical viewpoint. 


ji 

25.28. The advantage of this arrangement lies in the fact that it eliminates possible 
correlational effects due to fertility gradients in the soil or accidental circumstances which 
may exercise a “ patchy ” influence on the whole block. If we could be sure that there 
were no such influences at work, and that the soil was entirely homogeneous in the block, 
it would not matter where the treatments were placed ; but by imposing the restriction 
that no treatment appears more than once in the same row or column we remove at least 
horizontal and vertical gradients from our comparisons. Suppose in fact that there were 
gradients running across the block and down it. When we work out the mean yield of the 
treatment A we shall add together five values, one of each in the various rows and columns. 
Similarly for B, so that a comparison of A and B is not affected by the systematic influences, 
which work equally on both. 

It is not, of course, true that the Latin square arrangement eliminates every effect due 
to soil heterogeneity. There might be systematic effects running diagonally which might 
still remain. It is, however, clear that in removing the effects in two perpendicular direc- 
tions we have substantially improved the comparison of mean yields as compared with 
a systematic arrangement. 


A.8. — vot. n. 


s 
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25.29. The analjsis of yarianoe of a p x j> Latin square may be carried out in the 
following fcmn : — 

Sum of aqttarea d.f. 

Between rows . . p — 1 

Between columns . p — 1 

Between treatments ... p — 1 
Residual . . . . (p — l)(p — 2) 

Total . '. . . p* — \ . . . (25.16) 

and the four constituent sums are, on the hypothesis of homogeneity, distribute as 
independently. Before proving this result we will consider an example. 

Example 25.1 (from Thomson, Brit. J. Educ. Psych., 1941, 11, 135 ; data by S. D. Nisbet). 

A set of children were divided into four equal groups and each group was given four 
lists of words to test spelling ability. Each list formed one of four different types of test 
which we denote by A, R, C, D. The arrangement of the experiment is shown in the 
following table, together with the total scores of the corresponding groups : — 


Groups of children 



1 

1 

2 3 

4 

' 

Totals 

A 

1 ; 81 

B C 

41 44 

D 

53 

219 

Lists of 
words 

i ^ 

2 j 38 

A B 

97 42 

C 

49 

226 

1 C i 

3 1 31 1 

D A 

43 67 

B 

36 

177 


1 B i 

4 ! 67 

i 

C D 

23 43 

A 

81 

214 

i 

Totals 207 

214 196 

I 219 

! 1 

_ j 

836 ; 


For instance, the first group of children had the first list of test A, the second of test 
D, and so on. No group had the same lists as another group, and each list was used exactly 
once. The scores (corresponding to yields in the agricultural case) were in fact the number 
of words spelled wrongly in a prior test but correctly in this test. 

The above table, of course, does not represent anything corresponding to the physical 
layout of an agjricultural experiment, but it shows how a similar object can be secu^ to 
the avoidance of contiguous effects. Since it is possible that some relationship may exist 
between the lists of words and the tests (e.g. by accident one list might be particularly 
unsuitable for a test), we wish to ensure that not only will each group of children have 
the four tests, but that no list shall be given more than once and every list at least 
imoe. This is precisely what the Latin square accomplishes. The fact .that the diagonal 
«n»digement of the letters is systematic does not affect the present inquiry, though in an 
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agricultural experiment a systematic diagonal fertility gradient might affect comparisons 
between treatments. ^ 

An analysis of variance on the usual lines gives the following results : — 


... 

. 


- - -- 

Sum of Squares. 


d.f. 

1 

Quotient. 

Lists (rows) 

359*5 

3 ' 

119*83 

Groups (columns) .... 

74*5 

1 3 

24*83 • 

Tests (treatments) .... 

4626*5 

i 3 i 

1542 17 

Residual 

606*5 

■ 6 

i ' 

101*08 

j 

Totals ; 

! 

5667*0 

1 

15 j 

* 


The differences between lists are evidently not-significant, from which we should conclude 
that they appear to be on a par so far as these tests are concerned. The quotient due to 
groups indicates that the children are more alike than chance would lead us to expect, but 
not significantly so, for the variance ratio 101*08/24-83 = 4*1, Vi — 6, Vj = 3, is not signifi- 
cant. On the other hand, the quotient due to tests is very significant, the ratio 
1542*17/101*08 — 15*3, v, — 3, rg = 6 being beyond the 1-per-cent, point. We conclude 
that there do exist differences between the tests. 

Construction of Latin Squares 

25.30. The numbers of possible Latin squares of order p is very large for high values 
of p. There are, for example, 576 squares of order 4 ; 161,280 squares of order 5 ; 373,248,000 
of order 6 and 61,428,210,278,400 of order 7. Up to this order they have been enumerated. 
Although many examples of squares of higher orders are known, the problem of enumeration 
for p > S awaits solution. Details and examples will be found in Fisher and Yates* 
Statistical Tables. 

By interchanging rows and columns the square can always be brought to a form in 
which the top row and left-hand column are in the order ABC, etc. It is then said to be 
a “ standard square For instance, there are four standard squares of the fourth order : — 

A B C D A B C D A B C D A B C D 

B A D C B C D A B D A C B A D C 

C D B A C D A B C A D B C D A B ' 

D C A B D A B C D C B A D C B A 

From each of these, 144 ( = 4 ! 3 !) squares may be derived by permuting all columns and 
all rows except the first. (There is no point in permuting the first row, because the result 
would be a repetition of squares already obtained with an interchange of the letters 
A .. D, not an essentially different layout.) The total number of squares, as stated 
above, is therefore 4 x 144 — 576. 

It is only necessary to specify the standard squares. To select a Latin square at 
random we choose a standard form at random and then permute rows and columns at 
random, the randomising process being most conveniently carried out by Sampling 
Numbers. For squares of order 8 or more, where the standard types have not been enumer- 
ated, we can oiily choose one of those which has, and hence select one at random from a 
restricted set of all possible squares. 
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Anafy^ of Variance for Latin Squares 

35.31 . We must now justify otbr assertion that the Latin square may be analysed 
in the form (26.16), and that the z-test applies to the variance ratios which arise in the 
analysis. ' . 

For an ordinary two-way classification we have 

^ (^jk - »..)* = (aV. - »..)* + ^ (« *, - «.)* + S - a:.fc + * .)*• 

Thus, if Xf is the mean of rows and that of columns in the Latin square, we have, writing 
X for x^^, 

^ (^fc 2 : {x^ - f )* + Z {x^ - xY + Z (x^ - a:,. - a?; + «)* . (26.18) 

and the three parts on the right are distributed independently as vx^ with p — 1, p — 1 and 
(p — • 1) (p — 1) degrees of freedom respectively. 

Now 


Z (a:^ - a:^ +«)*== Z (a:^ -:»)*+ Z {x^ - + 2f )* 

+ 2 Z {x^ — x) (x^ ~ a;,. — ajc - a:^ + 2 x) . . (25.19) 

where x^ is the mean of treatments. 

Consider the cross-product term in (25.19). The summation takes place over all p^ 
values in the Latin square. Let us confine our attention to the summation for some par- 
ticular treatment. For this summation the factor a;^ — f is constant. Summation for 
the other factor gives 

Z {x^ — a:,. — ajg -- a;^ + 2x) = px^ — Z a?,. — Z a;^ — px^ + 2px . (25.20) 
and since one treatment occurs in each row and column, 


Z a:^ = p» 1 
Za;^=p;r,J 

and hence the sum (25.20) vanishes. 

Thus the cross-product in (25.19) vanishes also and we have 


(25.21) 


Z(a:^ - xY xY + 2 (x, - xY + 2^ (x^ - xY 

+ 2 (X^--X^--Xc-Xt + 2xY- . . (25.22) 

This gives us the analysis of the sums of squares, and it only remains to show that the third 
term on the right in (25.22) is independent of the fourth. It will then follow that the four 
terms are distributed independently with p — l,p — l,p — 1 and (p — 1 ) (p — 2 ) degrees 
of freedom. 

The required property of independence can be established directly, but it also follows 
from considerations of symmetry in the Latin square which have an interest of their own. 
We have regarded the square as composed of rows and columns, with treatments allotted 
in a certain way ; but by rearrangement we can equally well regard it as composed of rows 
and treatments with columns allocated in a certain way. For instance, if we take the 
first standard square in (26.17) we may write it : — 


Treatment : 

A B C D 
Rows: 1 Ci C, C, 

2 C, Cl C 4 C, 

3 C 4 C, Cl 0^ 

4 C* C 4 Cf Cl 

where^for instance, treatment A occurs in row 1 , column 1 (Ci), row 2 , column 2 (Cs), and 
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BO on. This, of course, is not a physical layout, but that is immaterial for present purposes. 
It follows that since the sum of squares between columns is independent of the residual in 
(26.22), so also is that between treatments. 

The variance anal 3 rsi 8 then takes the form 


Sum of Squares. , | 

Rows 

£ (*r - »)• I 

Columns .... 

£ (*e - xy 1 

Treatments 

£(xi-sy I 

Residual .... 

£ (a?rc — aJr — a?c — + 2f)* ! 

! 

Totals 

1 

£ (Xre - S)* i 


d.f. 


j) - 1 
p - \ 
p - 1 

(p - 1) (p - 2) 


jP* — 1 


(25.23) 


25.32. The above form provides a homogeneity test of the usual kind. If the test 
proves significant of heterogeneity we may, in the usual way, consider the hypothesis that 

Xrc = tty Cfc • • • • • (25.24) 

where is normally distributed about zero mean. We leave it to the reader to show, as 
in Chapter 23, that in such an event the residual mean square is an unbiassed estimate of 
the variance of C with (p — l)(p — degrees of freedom. 


25.33. As in the case of randomised blocks, it appears that under certain general 
conditions the z-distribution is reproduced approximately for fixed values which are per- 
muted in all the permissible ways consistent with the Latin square design. We omit an 
investigation into this result (for which see Welch, 1937) as the algebra is considerably 
more complicated than for randomised blocks. The result has been confirmed by a limited 
number of experiments. 


OraecO’-Latin and Orthogonal Sqvares. 

25.34. If the two squares 


A 

B 

C 

D 

A 

B 

C 

D 

B 

A 

D 

C 

C 

D 

A 

B 

C 

D 

A 

B 

D 

C 

B 

A 

D 

G 

B 

A 

B 

A 

D 

C 


are superposed we have the arrangement — 

AA BB CO DD 
BC AD DA CB 
CD DC AB BA 
DB CA BD AC 


(25.25) 


(25.26) 


in which every possible pair of letters {XY being regarded as different from YX) appears 
just once. Such a pair of squares is said to be orthogonal. The form (26,26) is sometimes 
written with Greek letters instead of the second Roman set ; hence the name of Graeco- 
Latin square. It is also possible to superpose a third factor which we will denote by the 
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.dal 

B|82 

Oy3 

D6 4 

B y 4 

Ad3 

Da.2 

C /? 1 

G 62 

Dy 1 


Ba3 

P/J3 

Col4 

Bdl 

Ay 2 


.numerals 1-4 in such a way that each combination of any pair of types occurs just 
once, e.g. 


(25.27) 


Complete sets of orthogonal squares (i.e. those in which there are p — 1 factors for a p x p 
square) are known for all prime p and for p = 4, 8 and 9. Curiously, there is no set for 
p = 6. Up to and including p = 7 they have been enumerated. 

We shall not enter here into the use of these squares in experimental design. They 
are generalisations of the Latin square in which, by suitable arrangements, several factors 
can be tried out simultaneously, so that all possible combinations of pairs occur an equal 
number of times. 


Confounding 

25.35. It will be evident that if we wish to consider in full a classification according 
to several variates, particularly with replications, the number of individual members in 
the sample may be very large. For instance, if we wish to test a variety of barley with 
three different applications of four types of fertiliser, there must be 81 yields even without 
replication, if we want to make all the comparisons possible. Physical considerations may 
make a layout of an experiment on such a scale impossible. The difficulty is possibly more 
serious in experiments on expensive animals such as cows. 

Where economy in the size of sample is a very material factor we may be able to reduce 
the sample at the expense of sacrificing some of the less important comparisons. For 
example, to consider once again the case of barley and the effect of fertilisers : we shall 
undoubtedly wish to compare yields of D and not-D, K and not-X , P and not-P, N and 
not-N. We may also wish to compare first-order interactions of the type DK and not-D, K, 
But it is quite possible that interactions of higher order, such as the effect of dung in the 
presence of two other fertilisers, are negligible. Where we are prepared to assume that this 
is so, on the basis of prior evidence or otherwise, we can dispense with certain information 
and still make the comparisons we wish while retaining properties of orthogonality.* 

25.36. Consider, as an illustration, an experiment with three fertilisers, each of which 
is applied or not applied, say N, P and and four replications. In the ordinary way 
there would be 32 plots and we should have an analysis of variance as follows, assuming 
that blodk-treatment interactions may be regarded as part of the residual : — 


Sum of squares. d.f. 

Blocks ...... 3 

N 1 

P 1 

K .1 

NP 1 

NK 1 

PK. 1 

NPK 1 

Residual 21 

Total ...... 31 



CONFOUNDING 


Now suppose that we divide our main blocks into two sub-blocks, the first containing 
the treatments 

O (None), NP, NK, PK (25.28) 

and the second the treatments 

N, P, K, NPK (25.29) 

We may then analyse the variance as follows, regeurding the sub-blocks as blocks of four 
plots each : — 


Sum of squares d.f. 

Blocks ...... 7 

N . . . . . . .1 

P 1 

A' 1 

NP 1 

NK 1 

PK 1 

Residual ...... 18 

Total 31 


In fact, if we wish to compare the yields with N and those without N, i.e. 

N -f NPK + NP + NK 

with 0 -{” PK P -|- Ky 

it will be seen that we add two members from (25.28) and two from (25.29), so the difference 
is not affected by block differences ; and similarly for the other comparisons. Such a 
design is said to be balanced, and the interaction NKP is confound^ with block-differences, 
since in the eight blocks it cannot now be isolated from block effects. The advantage of 
the second design over the first is that, without losing anything appreciable in comparisons 
between treatments, we have gained a good deal in the assessment of block effects ; for the 
residual has only declined from 21 to 18 d.f. whereas the sum of squares between blocks 
has increased from 3 to 7 d.f. 


25.37. The ideas of orthogonality, randomisation, balance and confounding have 
been developed to an advanced degree and with great ingenuity, particularly by Fisher 
and Yates. The slight sketch we have given of the methods in this chapter is intended to 
be no more than illustrative of the way in which the theory of experimental design is capable 
of development, at least in certain fields, and the manner in which efficiency may be imported 
into a practical inquiry by a due regard to theoretical requirements of the design. For a 
comprehensive account of this branch of the subject the reader should consult Fisher’s 
Statistical MeOiods and Design of Experiments, Yates (19376), and a useful introductory 
account by Goulden (1939). At this point we leave these particular topics and return to 
certain general matters. 

Design and Randomisation 

25.38. Whenever an inference is to be made, and particularly where hypothetical 
populations are concerned, the reader will find it useful to ask himself what precisely is the 
population under consideration. We can illustrate the point very usefully by discussing 
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a subjeot bn which there has recently been difference of authoritatiye opinion— that of 
occasional conflict between the requirements of balancing and randomisation. 

m 

25.39. Consider in the flrst place the testing of a cereal under two treatments, denoted 
by A and B ; and to simplify matters as much as possible, suppose we are to sow eight 
plots in a straight' line. In what order shall we allot the treatments ? 

If the plots are not too large so that the row covers a big area, it is quite possible that 
there may be a trend of fertility in the soil itself which will affect yields differentially and 
hence interfere with comparisons which we might make. Suppose that we do wish to 
guard against a fertility gradient so far as possible. We might then decide on one of the 
“ balanced ” arrangements : 

AABBBBAA (26.30) 

ABBAABBA (26.31) 

ABABBABA (26.32) 

As will ba easily seen, if there is a linear gradient in fertility along the row the means of 
A and B treatments respectively will be affected to the same extent and hence their differ- 
ence unaffected. For instance, consider (26.30) and suppose the linear gradient is repre- 
sented by an additive factor q 4- hp, ik = 1 . . . 8. On the hypothesis that the remain- 
ing effect consists of a constant a for A-treatments with a normal residual f, and similarly 
for B, the yields are 

A-treatments : q + p -f o + li, ? + ^ + <* + f». g + -f o -|- f?, g 8ip -h o -f 
R-treatments : g4-3p-f6-f|„ g-f4p-|-6-(-l4, g-}-fy)-)-6-|-f», g-fOp-fi-ff* 
with means 

i (4? -f 18p) -f a -t- i (fi + I. + + I.) 

i {4g H" 1^) b -f- t (fs + + fs + fe) 

respectively. The differences of these two are independent of q and p. 

25.40. The alternative procedure in allotting treatments would be to distribute 
them at random. Such balanced arrangements as (26.30)-(26.32) might then arise by 
chance. But we might also get such an arrangement as 

AAAABBBB (26.33) 

Whait are we to do in such circumstances ? If we reject this arrangement we are rejecting 
the random allocation of treatments in favour of systematisation. If we accept it we 
know quite well that a fertility gradient, if it exists, invtdidate the inquiry. 

The reader will no doubt agree that, if other things are equal, the balanced arrange- 
ment is better than 'the arrangement (26.33). Wliat we have to examine is whether other 
things are equid ; in short, whether in rejecting randomisation we have lost anything 

useful in the testing of significance. 

*1 

25.41. Consider a. rather more general case in which an experimental area is laid 
out in p blo<ks of g treatments each. If the subscript j refers to blocks and k to treat- 
ments, we have the usual analysis with sum of squares between blocks (p — 1 d.f.), between 
treatmeoats (g — 1 d.f.), and residual ( (p — 1) (g — 1) d.f.). 

Now we have seen that if the individual plot-yield can be r^arded as a block effect 
|dtu Ait^tment effect plus a normal residual with constant variance firom plot to plot. 
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■tfie cngnifioanoe of treatment effects can be judged fix>m the ^-test in the usual way by 
comparing sum of squares between treatments with the residual sum of squares. This 
is true whether treatments are allocated at random or not. 

But suppose we wish to adopt the alternative viewpoint of 23.41 and make the infer- 
ence in the set of values obtained by permuting the observed values. These permutations 
will not affect the block means or the total mean, and hence the sum of squares between 
blocks remains constant. The remaining part of the analysis may be written — 


Slim of Squares. 

d.f. 

Treatment 

Si = S{x.ie - *..)* 

q - 1 

Residual .... 

St=£ (Xjjc - Xj. - x.k + as..)* 

(p - 1) (g - 1) 

Totals 

(rjk - Xj,)* 

J> (ff - 1) 


Rather remarkably, the z-test holds for the ratio 

Sx (p - 1) (g - 1) 

q-l ~ S\ ’ 

provided that treatments are allocated at random, independently of the distribution of 
residual effects in individual plots. 

25.42. Consider, then, the population of values, (g !) *’“* in number, obtained by per- 
muting the observed values. The total sum of squares in (26.34) is the same for all 
members. Consequently if is too great, S 2 must be too small and vice-versa ; and in 
general, if we confine ourselves to certain layouts and reject others, all the possible vidues 
of Si cannot appear. It is this fact which has been seized on by advocates of randomisa- 
tion. They point out that for balanced layouts Si tends to be smaller than for random 
layouts (a conclusion supported by experiment) ; consequently that the test of significance 
is invalidated and the estimate of error St too big. The difference between the two modes 
of thought may be expressed briefly in this way : with balanced layouts the real error is 
reduced but the estimate of error is too large, so that the significance of the result is more 
in doubt ; whereas with random layouts the estimate of error is exact but the error itself 
may be larger. The question is whether one prefers to be nearer the truth without knowing 
how near, or farther from the truth with a knowledge of the limits of error. 

25.43. For detaUs of the controversy on this topic the reader may consult the papers 
referred to at the end of the chapter. It brings into prominence an important question 
of infeTftfinfl which Can only be decided by the experimenter himself. If he chooses to 
regard any act of experimentation as one of a large population of such acts, to be carried 
out by liimBAlf or other workers, he may prefer randomisation in all circumstances, not- 
withstanding that every now and again he will hit by chance on a design which he knows 
is likely to give TwialAaiiitig results. But if he cannot take this very detached attitude (and 
most experimenters, being human, would think it poor compensation that their own errors 
are bn-ln-TinnH by the better luck of other people) then he will prefer to design a balanced 
layout, even if the exactitude of his tests of significance is impaired. 
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35.44. We must, however, not leave the reader with the impression that the 
desiderata of botih schools of thought are totally incompatible. It frequently happens that 
one can select a design which is both balanced and random. The Latin square is a good 
example. By imposing the restriction that a treatment must not appear more than once 
in a row or column we remove to some extent the interference of fertility gradients ; by 
requiring that it shall appear just once we balance the design ; and by leaving the rest 
of the layout to be determined by a random selection from aU possible Latin squares of 
that order we randomise so as to reproduce the distribution of the variance ratio in the 
required form, thus, as “ Student ” remarked, “ conforming to all the principle of allowed 
witchcraft ”. 
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EXERCISES 

25.1. A population is given by specifying the frequencies in comparatively narrow 
ranges of one variate, the frequency in the ith range being Nf and ranges teing of equal 
width. Show that if the population frequencies are large, the best estimator of the mean 
of a second variate which is linearly related to the first (in the sense of the unbiassed estimator 
of minimum variance) in a sample obtained by taking n^ members from the »th range is 
given when n^ is proportional to N^. 

25.2. Extend the result of the previous exercise to the case where ranges are of 
unequal width. 

If t^ number of ftums in England and Wales is known in the acreage ranges 0-49, 
50-99, 100-199, 200-499, 600 and over, what sampling proportions would you tidee in the 
yaridfis ranges to estimate the total acreage under wheat ? 
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25.3. If a variate | can be regarded as the sum of a systemli^tio component f (as) and 
an uncorrelated random component Sj and tj similarly as i] (*) -f" and if the random 
components are uncorrelated with each other, show that 

r (f, n) = cov{ l(a;). »?(»)} 

{ (var f {*) + var £,) (var 7 ; (a:) + var Cl) } * ' 

Hence, if a population is divided into strata the correlation between f and rj for these strata 
will, in general, be less than that obtained by combining strata to obtain larger units ; 
and as the strata are further subdivided the correlation between f and r) tends to zero. 

(Spearman, 1907, Am. J. Psych., 18 ; Wold, 1938a.) 

25.4. Illustrate the effect of the foregoing exercise by calculating the correlation 
coefficients for the data of Table 14.4 (vol. I, p. 333), (a) by adding the variates in pairs 
and so obtaining 24 values ; (6) by repeating the operation and obtaining 12 values ; 
and (c) by repeating the operation and so obtaining 6 values. 

25.5. (Markoff’s theorem.) Consider a sample of n independent values Xi . . . x„, 
X( being drawn from a population /T^ with mean and variance <r|. Suppose we have 
a function 0 defined by 

where the 6’s are known and the parameters pj depend on the according to the equation 

Pi=^°'i}Pp * 

the a’s also being known. Then an unbiassed estimator of 0, say t, with minimum variance 
may be written — 

n 

t = Xj. 

Show that the function t is given by substituting for the p’s in the expression for 6 the 
functions q given by minimising 

with regard to the q’s considered as independent variables. 

Show further that if this minimum value is the estimated variance of t is 

-A_ Z (« ff?). 

7h ““ 3 


25.6. In a experiment there are given five different foods, each of which is 

available in four grades. It is desired to feed each animal with one grade of each food, 
but only one, so that a comparison may be made of the effect of the differeht grades of any 
particular food. Use the Graeco-Latin square to show how the feeding can be carried 
out. 
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. 35.7. A water diwer is to be taken to ten spots and asked to say whether water 
is present below the sumtoe. It is dedded to choose fire spots where water is known for 
oetiain to exist and fida where it is known not to exist. The order in which the spots are 
to be presented is detnmined by spinning a coin, heads denoting water and tails not-water. 

The spinning ^ the coin results in- the first five trials giving heads. Would you 
accept this result \ spin again T 

35.8. Show tha\ a Latin square may be regarded as a three-way olassification in 
which p* members are' mot zero, but p* — p* members vanish. Derive the analysis of 
variance for the Latin i^uare from this approach and generalise it to the Graeco-Latin 
square. 



CHAPTER 26 

GENERAL THEORY OF SIGNIFICANCE-TESTS— (1) 

Hypotheses to be Considered 

26 . 1 . The kind of hypothesis which we test in statistics is more restricted than the 
general scientific hypothesis. It is a scientific hypothesis that every particle of matter 
in the universe attracts every other particle, or that Homer was blind ; but these are not 
hjrpotheses such as arise for testing from the statistical viewpoint. A review of the various . 
tests which have been introduced earlier in this book indicates that the great majority 
specify something about a population. Some merely assert a general fact such as “ the 
population is continuous ” or “ the population is rectangular Others are more definite, 
as for instance the population is normal and has a mean ” ; ctnd again others are less 
defimte in one direction and more definite in another, e.g. the population has unit vari- 
ance It is also usually a part of the hypothesis that the sample from which the inference, 
is being made was obtained by a random process. 

26 . 2 . Suppose we have a set of random variables Xi . . . x^. In the sample space 
W of n dimensions the sample-point whose co-ordinates are Xi ... x^ determines a point 
E, say, with a distribution function which we may write as P {E). If w is any region in 
IT, we may derive the probability that E falls in w, say P (E e w). Then we shall say that 
any hypothesis concerning the law P (E ew) is a statistical hvvothesis. If it ^etermines 
the law completely we shall call it simple . In the contrary case it is said to be composite. 

For Instance, in testing the significance of the mean of a sample of n, it is a statistical 
hypothesis that the parent is normal. This is composite, as also is the hypothesis that 
the parent is normal with mean ^ or the hypothesis that the parent is normal with variance 
o^. The hypothesis that the parent is normal with mean and variance is simple because 
then the parent is fully determined. 

Example ^26.1 

In sampling from a population dichotomised into classes possessing the attributes 
A or not-.d, say in proportion w and % ( = 1 — ro), the sampling distribution is the binomial 
(X + This is completely determined by the value of id, and hence a hypothesis as 
to the value of m is simple. Such, for instance, would be the hypothesis that male and 
female births occur in equal proportions. Similarly, in a multiple classification with pro- 
portions iDi, ta„ . . . Wg, a simple hypothesis would specify values for all the ro’s ; if only 
one were specified and s were greater than two the hypothesis would be composite. 

tn sampling from a bivariate normal population characterised by two means, two 
variant and a correlati on, a hypothesis about any one param eter would he 
and aitnili^rlv for a hinx)thesis concerning two, three or four parameters . Only if all five 
TOre gpftfiififld in ad^ion to the normality of thd parent would the hyrothesis be simple ; 
a^ fihig tjAtiyitfifttanrimg the tact that the sampling distribution of the means is inde- 
pendent of the other three parameters, and that of the correlation coefficient independent 
of the otfa^ four. 
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26 A. A hypothesis which determines the law P{Bew) completely except for v 
parameters is sometimes said to have v degrees of freedom. Such a hy|»othedB may be 
regarded as an aggregate of simple hypotheses. For instance, the hypothesis that a popula- 
tion is normal with mean /< is the aggregate, for all a*, of hypotheses that it is normal with 
mean ft and variance or*. 

26.4. The kind of argument we have used in testing hypotheses, for both large and 

small samples, is of this character : assuming that the hj^thesis is true, we can, with 
any assigned probability a, find a regioi^, in the sample space W such that the probability 
of E falling in TT-w. is a. We call the region of acceptance and the complementary 

domain w, the critical region. (This is the nomenclatrure of Chapter 19.) If our observed 
E falls in*!^^^ reject the hypothesis ; if not we accept it. As a rule, in practical cases, 
our r^ons w, are determined by the values of some statistic such as x in testing the mean. 

Errors of First and Second Kind 

26.5. In general, as we saw in Chapter 19, there are many possible regions of accept- 
ance for any given hypothesis and any given probability level a. For all of them we shall 
err in proportion 1 — a of the cases in the long run by rejecting the hypothesis if E falls 
in the ciitical region — 'provided that the hypothesis is true. But what about the case when 
it is not true 7 We cannot ignore this case, for its possible existence is the very reason for 
carrying out the test. It is of no use whatever to know merely what the test will do when 
the hjrpothesis is true without regard to its behaviour in the contrary case ; for if we are 
to consider only the events which happen when the hypothesis is true we have no right to 
use a test based on that assumption to reject it. 

By having r^ard to the behaviour of the test when the hypothesis is not true we are 
able to lay down criteria for choosing among the various tests obeying the rule 

PfEewolH,} = 1 -a, (26.1) 

where is the hypothesis. In fact we shall seek for the test which, while obe 3 nng (26.1), 
minimises the risk of accepting H, when an alternative hypothesis Hi is true uid Ho accord- 
ingly is false. That is to say, we shall endeavour to find w« such that, in addition to (26.1), 
we also have 

1 — P{Efu;o I Hi} = minimum. .... (26.2) 

26.6. From a slightly different viewpoint we may say that there are two possible 
errors in judging a statistical hypothesis : 

(а) We may reject it when we ought to accept it, that is, when it is true. 

(б) We may accept it when we ought to reject it, that is, when it is false. 

These are ^own as errors of the first and second kind respectivefy. The error of the 
first kind we can control exactly by setting up the proper region of acceptance determined 
by a. Errors of the second kind cannot be controlled in this way, but we can sometimes 
oadoulate their probabilities, and in any case can try to reduce them te a minimnm . This 
is the fundamental idea, first given explicit expression by Neyman and E. S. Pearson, 
wltlioh determines most of the work in the present and succeeding chapters. 

^ 26»7, The possibility of finding ^^ons of acceptance obeying (26.2) etoarfy depends 

on a precise spedfioation of what alfemative hypotheses are under consideration. We 
had better emphasise the importance of this point. It is customary to sptak; and even. 
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in a loose kind of way, to think of testing a hypothesis without refnence to alternatives. 
To take the case of testing for normality, we often say that the hypothesis undm: test is 
that the population is normal without specifying what other form it might have. The 
reader may say that the alter&ative he has in mind is merely the n^ation of the h 3 rpothesis, 
namdy that the population is not normal. But if so he will find it very difficult — in my 
own tiew impossible — ^to justify any of his tests on a logical basis. He will calculate certain 
statistics and accept the hypothesis if their values are consonant with the normal values ; 
but it will always be possible to find other populations for which the observed values are 
even closer to expectation. If agreement between theoretical and observed values is the 
criterion he should reject normality in favour of these alternative hypotheses. It is not 
until he specifies his alternatives and considers errors of the second Und that some firm 
foundation for intuitive processes begins to appear. 

26.8. Perhaps it may help to clarify the fundamental concepts of the present approach 



if we a simple illustration where the hypothesis under test Ht is simple and there 

is only one alternative Hi which is also simple. In Fig. 26,1 we show diagrammatically the 
scatter of sample-points which would arise in samples of two, Xi and a;,, the cluster on the 
right that due to Ho and the one on the left to Hi. In practice, of course, the sampling 
distributions are more usually continuous, but the dots will indicate roughly the condensation 
of sample density round central values. 

In determining the critical region we have to find an area in the (xi, Xt) plane such that 
its “ content ” is 1 — a. Two possible areas are shown, Wo being the area to the left of 
the line PQ, and Wq the area between the lines AB and BC. In either case the proportion 
in the oritipal regions of the frequency on hypothesis H, is 1 — ■ a, and if we reject H# when- 
ever the sample-point.falls in Wo (and similarly for Wq) we shall commit an error of the first 
kind in proportion 1 — a of the cases in the long run. 

Ccmsider errors of the second kind. By using the region Wt we should reject H«— and 
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therefirae accept Hx — every time the sample-point arose from Hx, that is to say in practically 
nil the cases where Hx was true, sinoe nearly all the sample-points ariaiii^ from Hi lie in 
lOg. Errors of the second kind are therefore very rare. On the othw hand, if we were to 
use Wq we should accept Ho every time a sample-point arose from Hi hut did not fall between 
-the lines AB and BC, that is to say fairly frequently, deftrly w, is the better critical 
region and has a much smaller error of the second kind than tCg. 

26.9. It is to be noted that the argument does not depend on the retative frequencies 
nf ooounence of the hypotheses Ho and Hi. This is generally true. There is no concealed 
form of Bayes’ postulate in this approach. 

26.10. When there are n variates and p wiknown parameters the geometrical repre- 
eentation can be extended by imagining a sample-space W of n dimensions adjoined to 
A ]parameter space of p dimensions. We cannot draw a picture of such a case on a two- 
-dimensional sheet of paper, but the geometrical imagery and terminology of the method 
axe frequently useful. A graphical illustration of a two-dimensional sample-space and 
a one-dimensional parameter si)ace has already been given in Fig. 19.3. 

The Power Function 

* 26.11. If for a simple hypothesis H«, (26.1) is true we define 

y P{EeWt\Hx} ^fi(Hx\w,) .... (26.3) 

4S8 the power of the critical region Wo with respect to Hi. Clearly the power is greatest 
when the probability of an enor of the second kind is least. 

In the expression on the left of (26.3) we regard the probability that E falls in to, as 
dependent on Hi, the hypothesis alternative to Ho. In the expression on the right we have 
regard to the power of the test for Hi as dependent on w^. 

If there exists a particular r^on with greater power than any other region obeying 
<26.1) we shall say that it is the best critical region, and the test based on it will be called 
the most powerful test. 


26.12. We proceed to consider in turn the following cases : — 

(а) Ho simple ; one alternative Hi which is simple. 

(б) Ho simple ; an alternative Hi which is compotite but can be regarded as an aggregate 

of simple alternatives. 

(c) Ho and Hi composite but expressible, as aggretrates of simplq ^ jj ^-ypotheses. 


Bimple Hypotheses : One Simple AUemcUive 


26.13. Suppose the parent population is continuous, so that the simultaneous dis- 
Ixibution of the n sample values Xi . . . is continuous ; and let the frequency functions 
of the sample values on hypotheses Ho and Hi be po («i . . . x^) and px (xx . . . x^) respect- 
ively. Write dx for the element dxx . . . dx„. Then we have 

I Po<frJ = l— « • • • • • (26.4) 

J 0 , 

wish to maximise, for vmriations in the domain w^, the integral 



. (26.5) 
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This is a problem in the Oaloulos of Variations and is equivalent to maximising uncon- 
ditionally the integral 

Iw ~ • • • • • 

or, what is the same thing, to minimising 

1 (Po — kpi) dx, ..... (26.7) 
J Wo 

where ^ is a constant to be determined by (26.4). 

It is known that the condition for a stationary value of (26.7) is that, on the 
boundaiy of Wq, 

Po — kpi = 0. ..... (26.8) 

If the solution is a minimum we have, inside Wq, 

Po <lcpi (26.9) 

and outside Wo, 

Po > kpt (26.10) 

This solution to the problem is fairly obvious on general grounds. If {7 is a function which 
is sometimes positive and sometimes negative, with a line of demarcation where it is zero 

(as must exist in virtue of continuity), we clearly minimise J {7 by taking into the region 

Wo all the points for which U is negative and no more. This gives us (26.9) and (26.10), 
and the boundary of Wo is the locus for which U vanishes. By convention we regard the 
boundary as included in Wo, which accounts for the equality in (26.9) and its absence in 
(26.10). 


26.14. The conditions expressed by (26.8), (26.9) and (26.10) are sufficient as well 
as necessary. For let Wi be any other region for which 


I Po do; -= 1 — a. 

J Wi 

If Wa and Wi have a common part denote it by w.i- Then 

. 1 = a — 1 podx 

Jwo-Wol JWoi 

= 1 podx 


and hence, from (26.9) 


jfc [ pidae> \ Po (far = 1 p, (far 
J Wt-“Wot ^ Wi— W#1 

> I Pi (far. 

Jufi-Wn 

Adding to both sides jfc [ pi (far, we have 

Jv>M 

jfc I Pi (fa; > I; I Pi (far, . 

J IS. Jwt 


(26.11) 


and hft TKyt for positive jfc, the power of Wi is less than that of Wa and tlm lattw is the best 
eritioed ri^on. 

A.8.— yoiii n. t * 
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Both in this aeotion and implioitiiy in the last we have requited £ to be positive. That 
it must be so if w, is to exist emerges from (26.8), for p, and pt essentially not negative, 
and if k were n^ative no solution for real variate-values would exist. 

Example 26.2 

Consider the normal population 

dF = exp {— i(a: —/*)*}<**, — co<x<oo. 

Let the hypothesis J7« be that p = a«, and the alternative that fi = Oi- We have — 


Po = — ®xp j - - ao)* 

(2;t)a I 1=1 


(2jt)a 

We can conveniently express this in terms of the sample mean x and the sample variance 
obtaining for the density function 

Po = r “ ? t 

(2jt)a ^ 

A similar expression is found for pi and thus, for the boundaries of the best critical region, 
we have 

= «p [_»{ (i - a.). ] 

= exp I (a, - o,)(2x — a, - Oi) j . 

This yields for the critical region 


or 


(ao — ai) {2x — a® — ai) < - log k, 

n 


(a, - Ox) * < i — of ) + - log {at — Oi) x#, say. 


n 


If Ox < Oo the region is then defined by 


X <x*, 

but if Ox > a« it is defined by 

X > 2,. 

The reader should compare the two cases on a diagram similar to that of Fig. 26.1. 
Example 26.3 

Consider again the normal population when the mean is known, say zero, but the 
variance unknown, e.g. — 




— 00 < X < 00 . 
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We now find, for hypotheses a = o, and a = 0 ^ 
which yields, for the best critical region, 


(** + S*)((^ - <Tf)< 


n 




Thus our critical regions are defined by 

nit- = 5 ?* 4 - 


< V (ffg - erf), say. 


2 = «* + < V if Ui < Uo 

m 2 = + 5* > 1 ; if Ui > (To 

The best critical regions in the space W are thus bounded by hyperspheres centred at the 
origin. Whether we take the space inside or the space outside a particular hypersphere 
as the critical region depends on the alternative hypothesis. The probabilities concerned 
can be evaluated directly without evaluating the constants k and v. In fact, the proba- 


bility of exceeding a given value of — 


nv n (x^ + 8^) 


^ xl is obtainable from the ;f*-dis- 


tribution with n degrees of freedom, and hence the relation between v and a can be 
ascertained from the ;f2-integral. 

In this particular case we may find without difficulty the power of an alternative test 

j \ ^ 

which would suggest itself on intuitive grounds. Suppose we find - 2 ' ^ Xi from 

^0 

the ;^*-distribution corresponding to w — 1 degrees of freedom and probability level a, 
and use, instead of the hyperspheres centred at the origin, those centred at the sample mean 

8^ < v\ 8^ > v\ 

Suppose that the alternative is that uf = 1*1 (Tq. In testing for the alternative 

Ui > (To we should, for the test based on v, find xl accept (Tq if 

nnitj. 

For instance, with n = 5, 1 - a = 0-01 we find xl = 15-086. The probability of an error 
of the second kind is 

Y rz.Vii 

Pi da; = dF (x^), 

Jo 

y2 

i.e. is obtained from the ;f*-integral with argument ^ = 13-71, giving /J {Hi | Wo) = 0-018. 


<^o- 


On the other hand, had we used Xi instead of xl we should have entered the table with 
four degrees of freedom, giving 13-277. Divided by 1-1 this gives 12-07, resulting in a 
probability of rather less than 0-017. This is the power of the second test and is lower 
than that of the first test, as of course it must be since the latter has maximum power. 

Simple Hypothe8e8 : Familie8 of Simple AUemcUivee 1 ^ if 0| 

26.15. Consider now the case where Ho is simple but Hi is composite and consists 
of a family of simple alternatives. The most frequently occurring case is the one in which 
we have a class of simple hypotheses Q of which Ho is one and Hi comprises the remainder ; 
for example, the hypothesis Ho may be that a mean has some value po Srud the hjrpothesis 
Hi that it has some other value unspecified. 
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^ For each of these other values we may apply the foregoing results, and find for each a 
corresponding to any particular member of Hi, say a best critical region w^. But this 
region in general will vary from one to anotW. We obviously cannot determine a 
different region for all the unspecified possibilities and are therefore led to inquire whether 
there exists, among the family of best critical regions one which is the best for all of 
them. Such a region is called the Uniformly Most Powerful and the test based on it the 
’ Uniformly Most Powerful test, conveniently shortened to U.M.P. test. 


26.16. Unfortunately, as we shall find below, the U.M.P. test does not usually 
exist unless we restrict our family Q in certain ways. Consider, for instance, the case 
dealt with in Example 26.2. We found there that for Ui < Uo the best critical region for 
a simple alternative was defined by 

X<Xo. 

Now the boundaries of the regions determined by f = constant do not depend on and 
can be found directly from the sampling distribution of x when the probability level 1 — a 
is given. Consequently the regions defined by x < Xo are the same for all Ui < Uo and hence 
the test is U.M.P. for the class of hypotheses that < a^. It is difficult to see how a better 
test could be devised, for, whatever Ui subject to Ui < Uo, the test controls errors of the first 
kind and minimises those of the second. 

However, if the best critical regions are defined by x > Here again, if 

our class Si is confined to the values of ax greater than Uo the test is U.M.P. But if ax can 
be either greater or less than a® no U.M.P. test is possible. The reader will easily verify 
for himself that the same is true for the test considered in Example 26.3. 

26.17. We now show formally that for a simple hypothesis depending on 6o — ^fhe 
value taken by the parameter 0 defining a family of alternatives — no U.M.P. test exists 
for both positive and negative values of fl — flo H the frequency function p(E\d) is con- 
tinuous, has ever 5 rwhere a continuous derivative with respect to B which does not vanish 
identically, and admits of differentiation under the sign of integration over W- 

Suppose that such a test does exist. Then for any B we have, inside 

Po <kp, 

which we may write 

p{E\B)>h(B)po(E\Bo). . . . : (26.12) 

Likewise, for any point E on the boundary of Wq we have 

p (E I'O) = h (B) Po (E I 0o) (26.13) 

By hypothesis p is differentiable in B and hence so is h. Moreover, as 0 -> 6o, A (0) 1. 

Hence if 

J = 0 - 00 

and primes denote differentiation with respect to 0, we have 
A(0) = 1 0<g<l 

ra p,qie) i 

L 90 p. I do) 

• • • 


. (26.14) 
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Farther we have 

^»(Elfl) =i>.(E|0,) + 0<r<l. . .(26.15) 

Substituting in (26.12) from (26.14) and (26.15), we find 

^{[j>'{^|fl)k+,A-?^-|||-||^[2>'(^|fl) . . (26.16) 

This is trae for any E and E and for all d, whatever its sign, and hence the expression in 
curly brackets vanishes. Thus we have 

[l>'(^|9)]..-^4J-~J[i>'(^|0)k =0. . . . (26.17) 

Po 1 Wo) 

Similarly this equation may be shown to hold outside and hence it is true throughout W. 

Now we have 

f p {E \ 0) dx — 

JW 

and hence, differentiating with respect to 0 and putting 0 — do, 

f [p'(^?10)]o.da; = O. 

JW 


Substituting from (26.17), we have 


and hence 

Thus, from (26.17) 




[ p' (^|e)]a. _ 0 
Pv (-S I 6o) 


[P' (E I 0)],. = 0. 


. (26.18) . 
. (26.19) 


But this implies that the derivative of p with respect to 0 is identically zero at flo> which 
is contrary to hypothesis. The theorem follows. 

It may be noted that in deriving (26.17) from (26.16) we used the property that A 
may have either sign. If it can have only one sign, that is, if our class of admissible alter- 
natives is confined to the ca«e when either 8 < Oq or 0 > Oq, a, U.M.P. test may exist ; and 
so we found in Examples 26.2 and 26.3. 


Best Critical Regions and Likelihood 

26.18. Since on the boundary of a best critical region we have p® — %>i = 0, that 
boundary is determined by the condition that on it the ratio of the likelihoods of two 
functions corresponding to Hq and Hi is constant. 

Consider now the case where Hi comprises a set of alternatives varying according to" 
the parameter 0, H^ being one of them. In accordance with the principle of maximum 
likelihood we shguld obtain, as the most likely value of 0, the solution of 

(IL-*’ 

A 
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where 6 is then expressed as a function of the variables. If this value is substituted in 
p, we obtain the distribution with greatest likelihood which may be written p (Q max.). 
The surfaces of constant likelihood are defined for this distribution by 

Po ““ (*0 max.) = 0. .... (26.21) 

Now these surfaces are, in fact, the envelopes of the family, varying with 0, 

Po — Ape =0, (26.22) 

dp 

for to obtain the envelope we differentiate with respect to 0, giving ^ ^ eliminate 0,*^ 

leading back to (26.21). Thus, if there exists a best critical region (and hence a U.M.P. ' 
test) for all permissible alternatives Hq, such a region will be the envelope with respect to 
such alternatives and will therefore be identical with a region defined by (26.21) ; and 
hence a test based on the principle of likelihood leads to best critical regions, if they exist.^ 
If, as is more usual, there is no common best critical region, the ratio of the likelihood 
of Ho to that of any particular is k. The surface (26.21) remains the envelope of the 
family of surfaces (26.22) for which & = A. 


Example 26.4 

Consider once again the normal form, where both mean p and variance are specified 
and the admissible alternatives are that they can have any values, subject of course to the 
variance being positive. For any given pi and Ci the best critical region will be given by — 

or 

This may be written in the form 


n {(:»—/>)*+«*}> constant 

orjcrQ 


where 

Thus, if Oi > Uo we have 
and if cr, < we have 




(* — p)* + «* > »*, say ; 
(x — p)* + «* < v*. 


For any specified and Oi the best critical regions are bounded by hyperspheres with radius 
vV n and centre atxi — = . . . = x^ — p. Owing to the fact that p varies with /ii and 

Oi, there will not in general be a best common critical r^on and a U.M.P. test ; and this 
remains true even if we limit our alternatives to Oi < a. and fii < /i. or by similar 
inequalities. 

We may regard x and s as independent variables and represent the data on a two- 
way plane (x, a). The best critical regions are then seen to be bounded by circles with 
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centre (p, 0) and radius v. Kg. 26.2 (adapted from Neyman and Pearson, 193dc) illustrates 
some of the contours for particular cases. A single curve, corresponding to a single proba- 
bility level, is shown in each case. 

C/Oses (1) and (2) : cti = Uo and p = ± co. The best critical region lies on the right 
of the line (1) if !>^o s^nd on the left of (2) if /ij < This is the case discussed in 
Example 26.2. ^ 

Case (3) : Ox < Co, say Ox — ico. Then p = Po + ^ (//i — and the region lies 
inside the semicircle marked (3). 

Case (4) : ai < Oo and fix = The region is inside the semicircle (4). 

Case (5) : Oi >• Oo and The region is outside the semicircle (6). 

There is evidently no common best critical region for these cases. The regions of 



Fig. 26.2. — Contours of Constant Likelihood in a Two-dimensional Case. (See text.) 


acceptance, however, may have a common part, centred round the value (po» o^o), and we 
should expect them to do so. Let us find the envelope of the best critical regions, which 
is, of course, the same as that of the regions of acceptance. The likelihood ratio is 


‘ = (r:) - ^ ) - r{ 

The partial differentials with respect to ni and equated to zero give 

n _ ^ ^ ~ V — 0 

ffi oj ai\ at } ~ 


whence we find fti 


^ - /*i) = 0 , 

X and ax == a and the envelope is 


4 
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The dotted curve in Fig. 26.2 shows one such envelope. It touches the boundaries of all 
Gie critical regions which have the same likelihood-ratio h. The space inside may be 
regarded as a “ good ” region of acceptance and the space outside accordingly as a good 
critical r^on. 

There is no best r^on for all alternatives, but the regions determined by envelopes 
of likelihood-ratio regions effect a sort of compromise by picking out and amalgamating 
parts of critical regions which are best for individual alternatives. 


Example 26.5 


In the previous example we have supposed that the sample space W was the same for 
all admissible alternatives. This is quite legitimate, for we can always regard the domain 
of variation as infinite by supposing that p = 0 outside the range of the frequency-distri- 
bution of the variates. In the normal case, of course, p does not vanish anywhere, so that 
we are compelled to consider IF as infinite. 

When, however, the sample-space for non-vanishing p is bounded, special circum- 
stances may arise, and it is occasionally necessary to consider separately the different 
discriminating regions. For instance, if the sample-spaces corresponding to and Hi 
are IFo and IFi, it may happen that IFo and Wi have no common part when both p, and 
Pi are greater than zero. If so, we can distinguish between Ho and Hi with certainty. 
If there is a common region IFoi then Wi — IFoi should be included in the best critical 
region, for to do so reduces the probability of errors of the first kind. But it does not follow 
that this should constitute the whole of the critical region, for we might then commit too 
many errors of the second kind, i.e. accept Ho too often when Hi is true. We may then 
wish to add to IFi — IFoi region Woo, making Wo altogether, such that Woo lies inside IFoi 
and po (E e Woo) = Po{E e Wo) = 1 — a. This controls the first kind of error to level a 


and reduces the second kind of error. 

Consider the population 

p(x) a — ^ <x <a + 

o 


K,,.- 

Ml 


0, elsewhere. 


Suppose a sample of n to have been drawn firom a population of this kind where 6 is known. 
We wish to test whether a has some value Oo as against the alternative Oi. 

The sample-spaces IFo and IFi are hypercubes centred at Uo and Oi. If they have 
a co mm on part IFoi the probabilities po and pi in that part are both proportional to the 
volume and po/pi = 1 everywhere in the region. If, then, we take any region Woo of con- 
tent 1 — a in IFoi and add it to IFi — IFoi get a best critical region, and there are clearly 
infinitely many such. 

For the a^issible alternatives Oi the hypercube IF] will move along the long diagonal 
Xi =Xo = . . . = «„ as Ui varies, and we cannot always find a common region of size 1 — a 

to form Woo. By taking such a region as a hypercube of side 6 (1 — a)”, however, fitted 
into one of the comers of IFo lying on the long diagonal, we “ nearly ” obtain such an object 
flinoe this region provides what is required so long as IFo and IFi have a common part of 
content 1 — a. Which comer we choose depends on whether the hypothesis is Oi > a, 
or Oo > Oi. 
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Bdation between U.M.P. Testa and Sufficient Estimators 

26.19. It waa thought at one time that the existence of a set of U.M.P. tests for 
a continuous range of admissible alternatives involved the existence of a sufficient estimator 
for the parameter concerned. This does not appear to be true in full generality, but is 
so in nearly all the cases occurring in statistical practice. We will prove a theorem on the 
subject : — 

If a system of U.M.P. tests exists and if any point in the sample-space lies on the 
b oundary of a best critical region, tlmn a sufficient estimator exists for the parameter whose 
v ariation provides the admissible tUtematives.*' 

It is enough to show that for an arbitrary point we have 


p, (E) = h (t, e)po{E) (26.23) 


for then t is sufficient for 8 by definition, 
region we have 

Pi(E) 
po (E) 


Now we know that on the boundary of a critical 

1 I 

= ~^h. Bay, 


where h varies with the x’a and with 8. We show that h has the form h (t, 8) by defining 
a function t and showing that if t has the same value at an}' two points E^ and E^, then 


for all 6. 


Pi {El) _ Pi (Et) 

Po {El) Po {Et) 


26.20. For this purpose we require a lemma to the following effect : if a set of U.M.P. 
tests exists, it will be said to be ordered if the condition ai > at implies that the critical 
region w (a*) is included in the region w (a,) ; and if a set of U.M.P. tests exists but is not 
ordered we can always find another set which is. 

w (ai) and w (a*) may include parts of W where p vanishes. Let the remaining parts 
be V (ai) and v («,) and, if v, is the common part of these regions, write 


V (ai) = r, + v' 1 

V (a,) = »* + v" j 


(26.24) 


where «*, v' and v" have no common points. Now for any value of 8 and for any E in w (xi) 
— and therefore in »' — ^there is an hi such that 


Pi (E) > hi Po {E) in v' 

< hi Po {E) outside, and therefore in V". 

Similarly, within w (aj) and hence within v" we have an A, such that 

Pi (E) > h,po {E) in v" 

< A, Po (E) in 

It foUowB that, from the inequalities deriving from v", hi > A*, and similarly, from v'. 
At > hi. Hence At = At = A, say, and 

Pi (E) =hpo (E) ..... (26.26) 

within »' and »" for any 8. 


* The remains true if there is a set of points of measure zero for which the condition as to 

boundaries is not fulfilled. It is also true for several parameters, as may be seen by an easy generali- 
satim of the argument. See Neyman and Pearson (1936a). 
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Now take 
such that 


» (flti) = »• + 


I p^dX = \ — 9.1. 


. (26.26) 
. (26.27) 


This is always possible, for the integral of po over Vo + v" is 1 — as, which is greater than 
1 — It follows from (26.27) and the first equation of (26.24) that 


I ptdx =\ p^dx. 
JV" Jr' 


(26.28) 


Now put 

w' (aO = IT, + u (Xt) = 
where IF, is the part of W for which p* = 0. 


IFo + Wo ^ v'". 
Then from (26.27) 


I po = 1 — ai. 

J lo' («x) 


Further, w' (ai) is a best critical region with respect to admissible alternatives, for (26.26) 
and (26.28) imply that 


and hence 


I pt,dx = \ px da, 
Jr"' Jr' 

I Pi da; = I pxdae. 

J 10* («,) J r («,) 


Finally, lo' (ai) is wholly included in w (as). 

We have therefore replaced the region w (ai) by another region w' (ai) with the same 
properties except that it is included ii) w (a,). The lemma follows. 


26.21 . To return now to the main proposition, let E be any point of W. If it belongs 
to only one boimdaiy of a best critical region with content 1 — a we put t(E) = 1 — a. 
If it belongs to more than one, we put t{E) equal to the mean between the upper and lower 
bounds of values of 1 — a for which the boundaries include E. In virtue of the lemma, 
this implies that whatever the value of 1 — a between these bounds, the corresponding 
boundary must contain E. 

Thus t is defined everywhere. Further, if it has the same value at two points Et and 
JSts these points must lie on the same boundary. It follows that on this boundary 

Pi jSi) __ Pi {Et) 

P» (El) p, (Et) 

and hence the theorem is proved. 

The converse is not generally true, but one has to exercise some ingenuity and import 
some artificiality to construct examples where it fails. Cf. Exercises 26.3 and 26.4. 

Composite Hypotheses 

26.22. We sh^ consider a class Q of admissible hypotheses depending on r « 
parametm 6i ... 9^ .. . 9^^, and shall regard the hypothesis Ht under test as one of 
this class. A composite hypothesis of r degrees of freedom is one for which s of the p^- 
meters, say . . . 9^^, are specified, the hypotheses determining the diBtribotl<’>i 
apart from the unspecifi^ parameters. For example, the hypothesis that a p^t :.on 
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is normal with specified mean, nothing being supposed about the variance, is a composite 
hypothesis of one degree of freedom. It will be assumed that any admissible simple alter- 
native is given by specifying the other r parameters 0i . . . 6^ and that there is a common 
sample-space W for all such alternatives. 

Regions Similar to the Sample Space 

26.23. In order to test the composite hypothesis we need in the first place to 
control errors of the first kind by determining a critical region Wy such that 

f po da; = 1 — a. . . . . (26.29) 

J w 

This, however, differs from the simple case in that p^ can vary according to the unknown 
parameters, and to be certain of controlling the error we must be able to find w such that 
(26.29) is true whatever 0^ , . . 0^.. If this can be done we shall call the region w similar 
to the sample-space W and shall speak of 1 — a as its size. 

The problem of testing composite hypotheses then becomes one of (a) finding the 
similar regions, and (6) selecting from among those regions the one which minimises the 
second kind of error for a simple admissible alternative If this is the same for all 
Hi we shall have a common best critical region. 


26.24. We consider in the first place the composite hypothesis with one degree of 
freedom. The general problem of finding similar regions in such a case has not been solved, 
but a solution is possible in one important class of case, namely, that for which 

(а) po is indefinitely differentiable with respect to 0i for almost all values of 0i, 

(б) the function p^ obeys the relation 

(26.30) 

where 

^==~log2>.. (26.31) 

and A and B depend on Oj but not on the x's. In particular the normal distribution 
is of this type. 

Under conditions (a) and (6) it follows that for w to be similar to W it is necessary and 
sufficient that 


= 

Let be a region for which (26.32) is true. Then for k — I and 2 we have 

I Po <f>dx = 0 
Jw 

f Pt (^® + (f>')dx = 0. 

Jw 

In virtue of (26.30), this last may be written 

f P. (^* + .4 + B<^) dx — 0, 

Jw 


(26.32) 


whence 


f p.<f>*dx = — A \ Po dx — — A (I — (x). 
Jw Jw 


(26.33) 
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Differentiating (26.33) with respect to di and using previous results, we find 

f p, da: = {2AB - A’) (1 - a), . . . (26.34) 

Jw 


and generally 


f p. da: = (1 - a) (^i), (26.36) 

Jw 

where (ffj) is a function of fli only, and is therefore independent of w. Now (26.32) is 
true for W = w, and we find 


so that 


I pt <l>'‘ dx = y>t (0i). • 

JlV 

- — ^ — f po dx = [ po dx» 
i ■“ * Ju; iw 


. (26.36) 

. (26.37) 


Now consider the random variable Since p^ integrated through w is equal to 1 — a, 

we may regard — as a frequency function defined in w. It follows from (26.37) that 

the moments of ^ in this domain are the same as those of <f» in W. Consequently, if the 
moments determine the distribution uniquely, the distributions of ^ are identical. 

Hence we may use the hypersurfaces <f> = constant to set up similar regions. The 
space W may be imagined as composed of shells of infinite thinness bounded by these 
hypersurfaces. If we determine an “ area ” on one of these shells equal to 1 — a times 
its area in W, the totality of such areas will constitute a region w of size 1 — a ; and since 
this will be so irrespective of di the region w is similar to W. 


26 •25. When similar regions are determined by the above method we have to find 
the best critical region from among them. Let Hf be a simple admissible alternative. 
We require to find from the regions w a region such that 

I p^ dx = maximum. ..... (26.38) 

We now show that this is equivalent to maximising 


subject to 


I Ptdw {<!>), 

Jw(^) 

f Po dw (^) = (1 — a) f p. dW (^). 
JtDW Jww 


. (26.39) 

. (26.40) 


Here w ((j>) means the element of w for constant <f > — the “ shell ” of the previous section. 
The object of this is to reduce our present case to that of simple h 3 rpotheses. We take 
^ as a new variable and consider together the remaining variables (which amounts to deter- 
mining similarity of wand W in each separate shell between ^ and ^ -{- d^, as in the previous 
section), and are thus left with regions dependent on <f>. Equation (26.39) then requires 
that the probability of the second kind of error in each shell must be a minimum, subject 
to the control of the first kind asserted by (26.40). 
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Suppose that (26.39) were not maximised. There would then exist a set of values of 
for each of which we could determine a region v {if>) such that 

I po dv (^) = (1 - a) f p, dW {<!>) ... (26.41) 

J«(4) Jww 

and 

I Pt dv (^) > I pt dw^ {<!>). .... (26.42) 

J « (^) J U>e (^) 

Let E be this set of values of <f> and CE the remaining set. We prove our result by obtain- 
ing a contradiction, namely by defining a region v which is sii^ar to W, and such that 

\ Pidx>\ . . . . (26.43) 

iv Jwa 

which contradicts (26.38). 

Take as v the shells of hypersurfaces (1) in CE which are identical with Wq (^) and 
(2) in E which satisfy (26.42). Now 



1 da? = 1 d^ 

Ptdv(^) 


Jr JE-^^CE 

J r {4) 

and 

1 p^dx = \ d<f> 

f p,dw.(<l>). 


Jw, J E+CE 

J tv. w 

Hence 



1 p4dx - 

- f p,dx =1 d4>\\ Pidv{<f>) ■ 

- f p,dw.(<f>)\ 

Jr 

J IT# J E-\-CE La V (4) 

J Ws (4) J 


= f If p, dr («^) - 1 

\ Pt dw. (<^)| > 


J E Uv (^) J 

' w, i4) J 

which is 

the contradiction required. 



(26.44) 


26.26. Thus our problem is reduced to that of finding, in the shells == constant, 
portions Wq (^) which maximise the integral of We have, so to speak, brought the 
problem down one dimension by locating it in shells instead of dealing with it throughout 
the spaces w and W. It now becomes that of a simple hypothesis in (n — 1) dimensions, 
and the best critical region is the one for which 

Pt>ip„ (26.46) 

where A: is a function of 0. The sum of these regions for the various values of ^ gives us 
the complete solution to the problem, and if this sum has boundaries which are independent 
of we have a common best critical region and a U.M.P. test. 

Example 26.6 : “ Stvdenfe ” Hypothesis 

A single sample is taken from a normal population 

with unspecified a. We have thmi one degree of freedom, 6^ = a, and the hypothesis H. 
is that (i = jUe, say. 
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We find 


, 3 1 n . E(x — ito)* 


da a* a* 


2 » _ 3 ^ 
a* a 





Condition (26.30) is satisfied, and 0 is constant over the hypersurfaces 
27 (x — fio)^ = n {(x — + «*} == constant. 

The hypersurfaces are hyperspheres in W, To construct a similar region we have merely 
to pick out a region of size 1 — a on each shell and to amalgamate them. In our present 
case this is particularly easy because po is constant over the shells and we need only pick 
out areas on each shell bearing to the area of the hypersphere the ratio 1 — a. 

These areas need not be of the same shape or similarly situated. By selecting them 
in diflFerent ways an infinite variety of regions may be constructed. We have to find the 
best for an alternative simple hypothesis a = Ci, fi ^ pi. 

The condition (26.45) becomes 

+ *’>] > + *■>]■ 

As we are dealing with regions which are similar with regard to a, we may put u = ut 
and find 

X (jUj —/«.)> i (n\ — iug) — i of log * = (ni — juo) ki, say, 

71 


where ki = ki (^). Thus we find, for the boundary of w. (^)> 

if jWi > X > ki {<!>) 

if /ii < a5^< ki (^), 

where ki has to he chosen so as to satisfy 

f p, dw {if>) = (1 — a) [ p« dW {4>). 

Ju>w Jw{4) 

Thus on any particular shell the “ cap ” cut off by the hyperplane x — constant must have 
area 1 — a and hence must subtend the same solid angle at the origin. Consequently the 
boundaries lie on a right hypercircular cone through the point whose co-ordinates are all 
equal to and whose axis is perpendicular to £ = 0, namely the line Xi = Xt = . . - = x„. 
For each a there will be a different cone. If fti> the cones will he in the posi- 
tive quadrant and in the contrary case in the negative quadrant. 

IWthermore, these regions are independent of /i,. Thus for the class of hypothesis 
fii > m or fix < fit (but not both together) the common best critical regions and U.M.P. 
tests exist. 

Finally we have to evaluate « in terms of the sample values determining theiioritioal 



COMPOSITE HYPOTHESES: SEVERAL DEGREES OF FREEDOM 287 


cones. We have already seen in Example 10.6 (vol. I, p. 239) that if * = - — — the 

s 

frequency inside the cone is 



dz 

+ **)5 


a. 


Thus-“ Student’s ” test, which we have previously considered on more or less intuitive 
grounds, is now seen to be the best in the sense of the theory herein developed, for the 
admissible class /it > /t^ or for that /it < //,. 


Example 26.7 

Consider a sample from the normal population with unspecified mean, the h3q)othesis 
being that a = (r,. We now find 

, 3 , nix — /i) 

^ = 5-logp»= 

dp 0^ 

3^ __ n 
dp~ 


so that (26.30) is satisfied. 

The hypersurfaces <f> = constant are the hyperplanes x = constant, and any regions 
of size 1 — « on these hyperplanes wiU provide similar regions w. The condition jp< > ^ p* 
will be found to reduce to 

s* (og —af) < — {x — pt)^ (og — <^) + jlog — + —log *1 = (<^ — of) *i. s»y- 

I (T, » J 


If Of > (To we have > kt (^) 

and if <r< < we have < hi (^). 


Since s* is independent of x, kt will be a function of a and n only. The best critical 
regions are those given by > sg and a* < sg as the case may be, and the appropriate 
values of «« corresponding to a may be found from the known distribution of «*. The 
critical regions are hypercylinders, and again there are two sets of best common critical 
regions, according as cr, > Oo or o, < ffo- 


Composite Hypotheses : Several Degrees of Freedom 

26.27. As a preliminary to extending the theory for one degree of freedom to the 
case of several degrees, we note that if a region w is similar to W with regard to 0i ... 6, 
jointly, then it is so for each of them separately ; and conversdy. The direct result is 
obvious and the converse follows in this way : (we need prove it only for r = 2 because 
the rest follows step by step). If then 

I pdx == 1 — % 

Jw 

is true for Ot, 6t ... 6^ independently of 6t, and for 0i, 03 ... 9,. independently of 63, 
then it is true for any values of 61 and 63 and any other fixed values of 63 ... 9,. ; and 
hence it is true independently of 9x and 03 together. 
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26.28. An additional preliminary requirement is the concept of independence of 
a family of surfaces of a parameter. Suppose 

ff{xi ... x^, 6) = Cj j = I, 2 ... k <n . . . (26.46) 

represents a family of surfaces, where 6 and the C’s are variable parameters. Let 
S {6, Cl . . . Cit) be the intersection of these surfaces, or, if ^ = 1, the surfaces themselves. 
Consider the family obtained by fixing 6 and allowing the C’s to vaiy. Then if any surface 
of this family for 6i can also be obtained from a second family for d, we shall say that the 
family is independent of 0. We get the same aggregate of intersections however 6 is chosen. 
For example, if 

fi = («i - 0)* + (*. - 0)* + (*. - 0)* = Cl 

and /* = Hr »» + »» == G,, 

the family S consists of circles in planes at right angles to the line Xi = Xt= x^ and having 
their centres on that line. This is true however d is chosen, and S is therefore 
independent of 0. 

26.29. Under certain restrictive conditions similar to those of 26.24 it is now possible 
to find solutions to the problem of determining best critical regions. We assume 

(1) that exists almost everywhere for all k and j = I ... r; 

dOj 

(2) that if ^) = -^logPo and 

then ^ ...... (26.47) 

(3) that the family of surfaces given by the intersections of is independent of 

dj for j = 1 .. . r. 

Subject to these conditions (which are sufficient but not necessary) similar regions exist. 
Consider any two surfaces ^i and ^i. Since w is similar with respect to 9i alone, we may 
find surfaces = constant and 

{ pdw (^i) = f p dW (<f>i). . . (26.48) 

JwWO Jw(4i) 

In accordance with assumption (3), the family of surfaces if>i = Ci is independent of 6t. 
Thus if Qt varies, W (^i) and w (if>i) will not vary, though perhaps they may correspond to 
other values of G^. Furthermore, (26.48) is true regardless of d». Hence within the shell 
^i constant we can repeat the analysis used for one degree of freedom. We find that 
the necessary and sufficient condition for w to be similar to W with regard to both 0i and dt is 

f Po {^i, ^») =* (1 — *) f P* ^i)> . • (26.49) 

J U> (^>. ^i) J W (^I. 4i) 

where W is the intersection of ^i = Gi, = Gt for any values of Ci and C ^ ; and similarly 
for to. . 

. JiB before, the most general region w is obtained by amalgamating the portions of size 
<1 — a) on the intersections of and The generalisation to r degrees of freedom is 
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immediate. It also follows in the usual way that the best critical region is the one for 
which 

1 Pi > 1 Vt dx, 

J M>o J r 

V being any other region of size 1 — a ; and is defined by 

Pi>h(e^ . . . (26.50) 

The following examples will illustrate the theory. 


Example 26.8. Ratio of Two Variances 

Suppose we have two samples of ni, Wa members from independent normal populations 
whose means and variances are unknown. The joint distribution may be expressed as 

^ ~ ■■ 

Consider the composite hypothesis Oi — <t, say. This has three degrees of freedom, 

for pu P 2 and a are unspecified. As the alternative we will take 

0l — fli, 02 /^2 ^3 ” ®4 = "■» 

and for Hq itself 

O 2 == b, 03 = a, 04 = 1. 

We have first to consider whether the conditions of 26.29 are satisfied. 

(1) Evidently po is differentiable for all parameters any number of times. 

(2) We find — 


d 1 

^ ~ *)} 

<l>t -= // log Po b) 

00 a* 

<^3 = log Po “ — ^ ^ ^ ^ {^*'1 (^1 — + ^2 (^2 ““ /^’ bY + w-i + nj 5 || 

00 o o 


and (26.47) is seen to be satisfied. 

(3) The hypersurfaces are evidently equivalent to 

riiXy^ + = C'i, 

where Cj is an arbitrary parameter. The hypersurfaces <j >2 -= give similarly 

^2 = C'i- 

Both these are independent of 02 and their intersections, namely x^ = constant, x^ = con- 
stant, are independent of 03 . Thus the third condition is fulfilled and we may apply the 
foregoing theory. 

The equations ^ constant, ^2 = constant, <^, = constant are equivalent to 

Xi = constant 
X 2 == constant 

n^8{ + n^4 = constant = + n*) cv^, say. ^ 

V, 


A.s. — VOL. II. 


u 
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Tlie element u;, is part of W (^j, 4‘*) within which 

Pt >!>•/* («i. »i. «o) 

and this condition, by reference to the frequency function, becomes 

®*P [- - A*)* + «i «i) - (*»-/*- 6)* + 

< a«:+l"0«. [“ ^ (*» - Pi + Pt)* + 

Since the region w is independent of (i, b and a, we may put them respectively equal to 
^ 1 , bx and Oi and hence find for the condition 

«» (1 — 0*) {(*» — Pi— bx) -H «!} < 2ef 0f (log h —n, log fl,). 

Since this inequality holds good on £2 = constant it contains only one variable and we 
accordingly find two cases : — 

If 04 == ~ > 1 the best region is defined by ^ > Ai (f 1 , Xz, ^2) '» 


If 04 = — < 1 the best region is defined by si < Ag (^i» ^ 2 , 

(fi 

We have now to determine h '2 so as to satisfy 


I Po daJ == (1 — a) I po dx. 

J «re (^1, ikt, <k») J Wo iK 

Now W (^ 1 , ^a, ^a) is the locus for which Xi, Xz and are constant, and thus the integral 
on the right is the product of 1 — a and the frequency function po (xu Xz^ si). Similarly 
that on the left is the integral of this function over the region for which si %h'. Thus 

f Po dx = f Po (xi, Xa, si, Sg) dsl in the first case, 

J Wo J hx 

with a similar expression but different limits in the second. Now we have for the joint 
frequency function of Xi, Xa, s^ and si 

^ ~ + (»i + »») } J . 

Transforming from sf to ^ variable, we find for the condition, after a little reduction — 

r { («i + ««) ^ — »t «ir'^ a*"*"® (1 — a) f { («i + »*) aj - »a a»"* 

Jh’ Jo 

where h" — — - 8\. On substituting n, a, = (»i + n,) 8% u we find — 

-f(i 

It follows that depend only on a, Ui and Wa. Thus, whatever the values of Xi, x^ 

and si, the best critical region is defined by 


.-3 


dsl. 


Uz 

' Ihz? 

(1 — «) * tt 2 


du 


Wt— ’3 Wi-~3 

>) 2 2 du 






n. 


if a, > Ox 


n. 


if <r, < Ox. 
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These are equivalent to 


u 


^2 ^2 

+ »s ^ 


t(o 


If we put 


<Ub 


if <7, > (7i 

if 


Wa (Wi — 1) 5| 

the jB-distribution of u reduces to Fisher’s form. The result we have reached is therefore 
equivalent to showing that the 2 -test is the best for the ratio of two variances in normal 
samples. As usual, there is no U.M.P. test for the whole range of the ratio from 0 to oo, 
but two U.M.P. tests for the ranges 0 to 1 and 1 to oo respectively. 


Example 26.9. Difference of Two Means 

Consider again the previous example, where now the variances are unspecified but 
equal and the means and — fii f b may have any values. The hypothesis Hq is that 
6=0 and has two degrees of freedom corresponding to p and cr. 

Let the alternative specify the parameters 

^2 ^3 

In addition to the quantities required in the previous Example we now use also Xq and 
djj, the mean and variance of the pooled samples. 

We find that the three conditions of 26.29 are satisfied, and 


/T ^ 


{ (X. - + •»?}. 

O’ (7 


Equivalent to this family are the surfaces 

=- Cl 
5* = Cl. 

The condition p/ > h {<fti, <f>i) p. reduces to 

bt (Xi — Xi) < h' {xb, «ii), 

and as usual we find two cases according as pi > pi or vice-versa. We consider only the 
first, the second being analogous. 

Writing v = Xi — Xt > we have to determine k’ by 

rh," 

Pb(x, si, v) dw - (1 - «) Pb [Xb, Si, v) dv, 

J h'" J /*'" 


where A'" and are the lower and upper limits of the variation of v for fixed values of Xq 
and 

The frequency function of Xq, «o> ^ easily found to be 


/ cc Si^‘-^j (rii +nt)s§ — »i a? 


Ui til 
Wj + ^2 

whence that of Xq, and v is found to be 

»1 r- 


t?* t 2 exp 


oc (4 


(«1 + »«)* / 


exp 


til + 


+ «?}]. 


2a* 
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Since St ®nd are constant over the domains under oonsideratiion we have to satisfy 


where 
If we put 

this reduces to 


(^i ^2) ^0 (^1 “f" ^2) ^0 


V = 


(ni + Wg)^«o 2 
V(nin2) (1 + 2®)^’ 

dz 


and 


Q, + 2 ’ ~ 

^ ^ f 1 - f a / 

V(^i «i + ^2 4 ) V ' 


= 1 - 


rii fit 

w. + no 


We have thus arrived at the <-test for the difference of two means in normal variation when 
variances are equal. Once again the test we introduced on more or less intuitive grounds 
has been shown to be justified in the light of the theory developed in this chapter. 


Linear Hypotheses in Normal Variation 

26.30. Several of the hypotheses dealt with in foregoing examples are particular 
cases of a general class known as linear hypotheses, which accounts for the fact that we 
keep arriving at the same sort of conclusions respecting them. 

Suppose we have n independent variates typified by distributed in the normal form 

" - “p { - i 

with common variance or® but different means. Suppose the means are connected with 
r and s unknown parameters 0i . . . 0^ . . . 0^^., by linear equations of the type 

• = L CjigOj. t . . . . . ( 26 . 51 ) 

Suppose further that the hypothesis Hq specifies r pa^rameters 

01 = -Bij . . . 0,. == 

and hence is composite with s degrees of freedom. Then Ho will be called a “ linear 
hypothesis ”. The reader can verify for himself that “ Student’s ” hypothesis, and the 
hypothesis as to the difference of two means when variances are equal, are of this type. 
The homogeneity test in variance-analysis and the test of regression coefficients are also 
reducible to the same form. If, of course. Ho specifies r linear relations among the 0’s 
instead of the 0’s themselves, it can be reduced to a hypothesis which specifies the 0’s 
directly, except perhaps in degenerate cases which need not detain us, 

26.31. The theory developed in the earlier part of the chapter for composite 
hypotheses may be applied to linear hypotheses as we have defined them, and the argument 
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follows exactly that of Examples 26.8 and 26.9. It is readily verified that the three con- 
ditions of 26.29 are satisfied. We have — 


<l>j = constant j 


j,. 2n 3 , 

9a ^ - 2 - - 9a 
or (f 


(26.52) 


. (26.53) 


We can therefore find similar regions w {<fh . . . <f>„) and select from them the best 

critical regions in the usual manner. We will omit the rather cumbrous algebra and quote 
the following result (Kolodzieczyk, 1935). 

Transform to new variates A’l . . . equation 

r+M n 

~ f^k ^ ^jk ^ ^Jk yp • • • • (26.54) 

i-1 ;=r+«+i 

where the c’s are those given in (26.51) for k < r + « and the other c’s are orthogonal, i.e. 


Define 


and 


^ Cki ("'Ji =0. j> r + s 

i - 1 

= 1. k =j, j> r + s 


y1 


= X* 

J~r-h^-hl 

n / r+8 v 2 


A further transformation of u . . . E^.^g is now made to variables i 
that (26.57) becomes 

r r+8 

5^0 “ ^jk ^k + Wk • 


j,k=l 


fc^r+1 


r+s 


"= 4 ^ Vfc- • 


(26.55) 

(26.66) 

(26.67) 

Vr+s SO 

(26.68) 
(26.69) 


fc-r+l 


The coefficients R can, of course, be obtained from the c’s by ordinary determinants! 
algebra. 

Writing now Sj — Oj — 0'}, i.e. the diiference between Oj on the alternative hypothesis 
and its value if H« is true, we find that the best critical region is given by 


^Jk ^k 




i. k-\ 


^jk 


> »o. 


(26.60) 
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where v is distributed in the form 

dF oc (1 - - 1 <v <1 . . . (26.61) 

and Vo is given by 

1 - a = dF (26.62) 

Jr, 


26.32. There is one interesting conclusion to be drawn from (26.60). If a U.M.P. 
test exists, v should be independent of 6j and hence of Sj. This appears to be possible 
only if the denominator in the second part of (26.60) is rational. But this denominator 
is seen from (26.59) to have the coefficients of a positive definite form and hence is only 
rational if r == 1. We conclude that if r > 2 no U.M.P. test is possible for linear hypotheses 
in normal variation. 

We have already seen that under general conditions no U.M.P. test exists for r — 1. 
A similar conclusion follows from (26.60) if r = 1, for it then becomes 


Fi i Cl Fi 

V(«ii) I I 


> Vo, 


. (26.63) 


which, as usual, leads to two cases according as €t 


> 

< 


0 . 


26.33. We will pause at this point to review our results. We began by defining two 
kinds of error and showing that a test could be defined as ‘‘ best ” for a single alternative 
hypothesis if it controlled the first kind and reduced the second to a minimum. When 
there is a class of admissible alternatives we may sometimes arrive at a U.M.P. test which 
will minimise errors of the second kind for any member of the class, and such a test may 
be regarded as the best attainable. Though the U.M.P. test does not exist in the great 
majority of cases, we may find tests which are U.M.P. for either 0i > 0o or 0i < 0o- Such 
tests have been reached for “ Student’s ” hypothesis and several others in common use, 
and are found to give the same tests as those introduced on rather intuitive grounds in 
Chapter 21. 


26.34. The absence of a U.M.P. test implies that in the majority of cases we have 
to look for other criteria to provide “ best ” tests. In the remainder of this chapter and 
in the next we shall consider several lines of approach which have been developed : — 

(a) Relying on 26.18 we mav evolve tests based on the likelihood ratio . These will 
gi ve U.M.P. tests if such exist, and in the contrary case will do their best, so to speak, by 
fi nding the greatest common denominator among the best critical regions. 

(5) We may consider the properties of tests when the sample number n tends to infinity, 
and so obtain tests which are U.M.P. in the limit. Such tests, like maximum likelihood 
estimators, may be employed on the groimds that they are ‘‘ best ” for large n and 
presumably good for small n. 

(c) We may derive a new criterion from the concept of bias in statistical tests, which 
will be explained in the next chapter. 

(d) Recognizing that there is no test which is U.M.P. everywhere, we may seek for 
one which is U.M.P. in the neighbourhood of the true value. The idea behind this approach 
is that it will be more important to detect errors in the neighbourhood of tiie true value, 
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and that large errors may be left to look after themselves, either because they are infrequent 
or because almost any “ reasonable ” test will reveal them * 

(c) When a number of independent parameters are involved, we may abandon the 
attempt to test fo r each s eparately and confine our attention to the class of hypotheses for 
which they are functionally relate d, e.g . by = f (fl, . . . T his reduces our problem 
t o the case of a sinde parameter w. a nd we may be able to show that a particular yp is the 
best, in the sense t hat it is U.M.P. w ith respect to all other y *s th at is, to all other teste 
d epending on the single f imctinn nf f.tiA nnlmnwri 
Wejproceed to consider these - 


Teats Baaed on Likelihood 


26.35. Suppose that for a given member of a composite hypothesis Ho the joint 
sampling distribution of the variables Xi . . , has a frequency function po (which is, 
of course, the likelihood). Considering the x'» as fixed, we may examine the variation of 
Po according to variation in the unspecified parameters . . . 0^ which form a set, say 
(o. Let po (eo max.) be the maximum value of po for such variation. Similarly, if JQ is 
the class of admissible alternatives Hi, let pi (ii max.) be the maximum of the likelihood 

0^+^. Write 
Po (o) max.) 

Pi (i? max.)* 

Then a possible criterion for accepting is to take as critical regions those points for which 

A < constant = C, say, ..... (26.65) 


for variations of all the parameters 0i . . . 


(26.64) 


where C is determined by relation to a probability level a from the sampling distribution 
of A, which of course is independent of the unknown parameters. In defining A we have 
assumed that the maxima on the right of (26.64) exist, but we can give the equation greater 
generality by taking po (od max.) as the upper bound of values of po in the set co where no 
maximum exists ; and so for i?. 

In this form the criterion states that we are to accept Ho if the maximum likelihood 
in the set of permissible Ho's is greater than a specified proportion of that in the set of 
alternatives Hi, In doing so we control the first kind of error in the ordinary way. So 
far as concerns the second kind of error we saw in 26.18 that for Ho simple the criterion 
provided a sort of highest common factor among available tests ; and presumably qualities 
of this kind will be equally useful when Ho is composite. 


The Problem of k Samples 

26.36. We will illustrate the theory of the likelihood tests by discussing a problem 
of considerable practical importance. Suppose we have a sample from each of k normal 
populations, Xfj being the jth member of the ith sample. Let 

Ui be the number in the ith sample ; 

N ^ Z be the total number of observations ; 

x^ be the mean of the ith sample ; 

al be the variance of the ith sample. " 

♦ An alternative line would be to concentrate on errors of the second kind for larger deviations, 
on the ground that large errors are more important than small ones. I understand from Dr. B. L. Welch 
that he considered this approach shortly before the war ; the results did not differ very materially from 
those given by requiring optimum properties near the true value in the case he examined, and the 
results were not published. 
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^ i(Slkj^der three different hypotheses Ho : — 

’ (i).' Giat all populations are the same and hence have the same unspecified mean and 
unspecified variance. 

. 'Si, that they have the same variance but different unspecified means fti , . . 

(S) Ht, when it is known that they have the same variance, that they have the same means. 


We have for the joint likelihood — 

1 1 

P = N k exp 

TI 

Consider first of all H. We find, for p (Q max.), 

f, ==//, ( 26 . 66 ) 

=«r.., (26.67) 

mid forp (m max.), putting all the /I’s and a’a equal and equating the first partials oflogpo 
to zero. 


/=i 


( 2i - Pi)* + gf 
2 af 


1 \ 

• ' /^I = ^0 = Jy ^7 • • • 

l=»l 

I ^ 

or* = a* = ~ ^ ».• { (2f - ^o)* + «?} . 

Inserting these values in p we find, after a little reduction, 

= ,2 (1)^ • • 


Similarly it may be shown that 


where 

and also that 



It will be noticed that 


. (26.68) 
. (26.60) 

. (26.70) 

. (26.71) 
. (26.72) 

. (26.73) 


26.37. The function A«, may be related to the correlation ratio rj*. We have 

«o = ^ (2i - *,)* (26.74) 

and hence 

Ah. = 1 1 - (2i - 2o)* 

= (1 - »/*)* (26.76) 

The distribution of is thus obtainable directly from the known form for 97 * in samples 
from an' uncorrelated population. 
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(Afl)^ = (26-77) 

^0 

2 

The distribution of ig that of 1 — yy^, where the distribution of ly* is 

N-k-2 

dF oc (rj^) 2 (1 - ?y 2 )— y- (26.78) 

It can accordingly be tested in this distribution or the related 2 -form. This is, in fact, 
the criterion used in the analysis of variance for homogeneity tests, and it is interesting to 
remark that the 2 -test here arises in considering the hypothesis that the various distributions 
parent to the sample values, being already knovm to have the same variance, have the 
same mean. The other form of hypothesis, Hy is that the samples come from the same 
population, and the equality of variance is not part of the data but part of the hypothesis. 
We are not then surprised, or should not be so, to find that the A// criterion leads to a 
different test. 

26.38. The moments of the distribution of A// may be obtained as follows. The 
joint distribution of and is 

dF oc 11 («,)“'“* exp 1^- 2 - 4 - Pi) j '^ndx<n . (26.79) 

The distribution of means is independent of that of variances and can be ignored. 
Further, if 

Z* == ^ (*i - ^o)® 


then is also independent of the variances, and we have 


dP oc 77 (Si)"‘ - 3 exp ( - 2 ^ ® exp ( - i;u®) 11 ds) dx- 

. . (26.80) 

Put now 

N ' 

. (26.81) 

and note that 

u* X^ = N4 

Nsl (1 - 2 tpi). 

. (26.82) 

Transforming to variables y) and s^, we find 

dF X 11 tpfT (1 - 2v>i) 2 77 d% 8^'-' exp ^ 


whence, for the distribution of the ip’s, 

dF X n ipt^ (1 “ 2V<) * 77 dipi- 

. (26.83) 

Now • • • 

. (26.84) 

and hence we may find the moments of X„ by integrating its powers over 

the distribution 
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(26.83). Integrals of this kind, known as Diiichlet’s, are expressible in terms of gamma 
functions and we find, for the pth moment of about zero, 

When all the n’s are equal this reduces to 


i^) ' 


M'P (^ h ) ^ 




26.39. For the criterion we start from the distribution 


and on putting 


we find, in much the same way as before 


dF cc n exp i — — i7 (w^ 5?) V 11 ds} 


c, = ^; . 

-^c,) . 

same way as before, 

Je-l 7H-X / ^ \ n»-3 

dF(u . . . cc n c,-T- (^1 


Furthef, 


whence we find 




PN 

NT- r 


(^) 


r I (p + \ 

-^_L" 2 -J 

} ^ ■ ■ 


P'pW- ^ 


( 26 . 86 ) 


( 26 . 86 ) 


( 26 . 87 ) 


( 26 . 88 ) 


( 26 . 89 ) 


( 26 . 90 ) 


( 26 . 91 ) 


26.40. For large we find, in virtue of the Stirling approximation to the gamma 
function, i 

(1) forA„ 

(2) for Ah. ftp-^ 

(p + 1) a 

(3) for Ah. Pp 

(P + 1) 8 
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(-logirJ*-2 

rik~i) 


(2) and (3) 


k~i 

(—log a;) 2 

7 ( 1 -)- 


Hence, by the transformation x ^ e we see that approximately Xjj is distributed as 
with V = 2k— 2, and and as with v = k—l. 


26.41. For small samples Neyman and Pearson have suggested approximating to 

1 ? 

the distributions of and by identifying their lower moments with those of the 
form 

dF oc (1 - 

This possibility has been examined in detail by Nayer (1936) for the hypothesis when 
all the Ti’s are equal. The distribution of X^ has also been studied by Wilks and Thompson 
(1937a). 


26.42. Modified forms of the above tests have been considered by various authors. 
We may write 

log Art, -= i log ^ (26.92) 


where, of course, 




In short, 4 Is » weighted mean of the sf and is a weighted geometric mean. Bartlett 

(1937c) has proposed using the degrees of freedom Vf { — — 1) instead of in these 

equations, that is to say, defines a criterion 


2 

fiV 



. (26.93) 


This test is, in the sense defined in the next chapter, unbiassed, whereas that based on 
Ajy, is not. Bartlett also suggested as an approximation that — ^ could be regarded 
as distributed as with it: - 1 degrees of freedom, c being given by 


1 + 




. (26.94) 


This has recently been reconsidered by Hartley (1940), who showed that it is not very exact 
for large it: and gave a better approximation which can be reduced to tabular form. Cf. 
Exercise 27.2. 
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lAkdihood Criteria for the Linear Hypothesis 

26.43. We now proceed to consider the application of the likelihood criterion to the 
class of linear hypothesis as defined in 26.30. We have, for the likelihood function, 


Writing S* = 27 (x^ — we have, for the stationary values of p, with respect to a and 
the parameters 6 (related to the /t’s by (26.61) ), 

^logpo = - - -f ~= 0 (26.96) 

da a a^ 

d- log P» = ,3 y (** - Mk) = 0. . . (26.97) 

This last equation is clearly the one we should get if we were seeking to minimise itself 
for variations in the O’s. Let nS^ be this minimum value. We shall then have, from 
(26.96), 

CT* = S* (26.98) 

The maximum of p in the class H of admissible hypotheses is then 



Similarly the maximum of p in the class e> for which 0i ... 0^ are fixed and the other 
s d’a vary, is found to be 


where n + S^) is the minimum of 8* under the conditions that 9i 
Thus we find for the likelihood ratio A 


A» = 

1 + 


1 


(•-I)' 

or, if more convenient, we may use the function 


. 0, are fixed. 


(26.101) 


Z = 


Sa 


to provide a criterion. 

Now we make the transformation (26.54) and show that the values and Sf, as we 
have defined them here have, in fact, the values given by (26.66) and (26.69). We have, 
from (26.64), 


n r r^8 n > 2 

t i“»l i-r+«+l J 

n ' n 

= 2d + J* c/fc %)* 

fc-j 

2 ; tg. 

**1 i>"r+a+l 
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Since n Sq is the minimum of S* for aU variations of the 0’s and E and y are independent 
of the fl’s, we must have 

nSi .= Ey]. 

Also, since is the minimum of /Sf* when the values 0, ... 0,. are fixed, it is seen to have 
the value given in (26.59). 

We have also 

-S* - + wSf, (26.102) 

Jt ✓ r-fK ■v a 

where nSj* . 

k r\ \ / 

and the frequency function of E'a and y’s is given by 

Er+,, . . .y„) oc exp I - (St + -Sg) j. . (26.103) 

Now is the sum of squares of n - r -- s normal variates, and hence 

S(S.,) (X Sf -r-^ • exp ( - . . . (26.104) 

Hence, since the i/*s are independent of the y’s, and since depends only on the j/’s, 

f(8,, E, . . . Er+,) oc * exp I - (St + S^) |. . (26.105) 

We have seen, in effect, that n Si is the minimum value of S^. It depends on Ei ... 
and hence is independent of S^ and is distributed as 

a: -SJ >exp^-^^y 

Thus we have 

f(8„, 8,) X 8” 1 81-' exp (SI + S^t) 

Putting now Z = 8,,/8„, we find 

f(Z) oz (1 + Z*)“ V . 

which may be reduced to Fisher’s form by putting 

z - i log ' = log Z + J log 


rS’t 


(26.106) 

(26.107) 

(26.108) 


-We have thus reduced the test of the linear hypothesis to the z-test and it is seen that 
several of the tests introduced in Chapter 21 can be justified on the likelihood criterion. 
These include the Student ” test for one mean, the extended form for the difference of 
two means, and the test for the ratio of variances. Certain other tests in which the 
z-distribution (which, of course, reduces to the ^-distribution for Vi = 1) appears — such as 
that of the correlation ratio, the multiple correlation coefficient and regression coefficients 
— also depend on the linear hypotheses, and in the light of the theory here presented are 
seen to be different aspects of the same thing, at least so far as the testing of hypotheses 
is concerned. 


26.44. We will indicate briefly, without going into the complicated mathematics 
involved, some interesting results obtained by P. C. Tang (1938) and P. L. Hsu (19416) con- 
cerning the power of the z-test as applied to linear hypotheses. 
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The functions and i8|, as we have seen, are distributed independently in the 
;(*-form, and their ratio accordingly in Fisher’s form. From this viewpoint the test of 
the linear hypothesis is a generaliwtion of the test of homogeneity in the analysis of 
variance. Tang considers the distribution of 

and the variation for errors of the second kind, namely, when the values 6i ... 0^ are 
different from the specified values. He shows that the power of the test depends, not on 
individual alternative values, but on a single function of the 0’s. He also obtains the 
power function and tabulates it. 

Hsu then considers other possible tests which are based on this single function and 
shows that in this class of test the a-test or the equivalent E^-test is the uniformly most 
powerful. 


26.45. For large samples, when maximum likelihood estimators of the parameters 
exist, the distribution of — 2 log A is that of with 8 degrees of freedom. For the 
distribution may then be written (see 17.46) — 

dF = A exp I -27 - 0,) 0k - h) | . • • d^r+s 

so that p {Q max.) — A, . . . . . (26.110) 

If 01 ... 0,. are fixed the likelihood becomes 

p = A exp I Z'sfifc 4 - 

r 

where xl = 2^ 9'}k 0} - 0^) 0k - ^) .(26.111) 

j, 

and Zj is given by — 0, — Lj where L, is a linear function of the r specified parameters. 
Thus— 

p (o) max.) = A, e-**I (26.112) 

where A. is the value of A when 0y takes its true value ffjo. Thus, when ff. is true, 

A = e-t« (26.113) 

But the characteristic function of ;(o (= — 2 log A) is 

j p, Si .. . 

= A jexp + xl (*« - i) J . . . d§r+« 

oc -i — (26.114) 

(1 - 2*0* 

This is the characteristic function of a quantity distributed as with 8 degrees of free- 
dom, and hence the result follows. 

26.46. In concluding this chapter we may mention briefly a question which fre- 
quently presents itself when statistical hypotheses are being tested in practice. Our tests 
are based on the observed values obtained in the sampling process, and in order to apply 
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them we require no prior knowledge of the parameters to which they relate. They can 
be used in a state of complete ignorance about the parameters. But suppose some informa- 
tion is already available ; or suppose that we attach varying degrees of importance to the 
avoidance of particular types of error. How far are the tests developed in this chapter to 
be modified ? 


26.47. Consider, for example, the situation which has already been mentioned in 
connection with the theory of estimation, of the chemist who is assaying the strength of 
a particular drug. If the drug has harmful effects in large quantities it may be much more 
important for him to detect cases in which the true strength exceeds his hypothetical value 
than when the true strength is deficient. Again, the manufacturer of a “ guaranteed 
product is usually much more concerned with ensuring that it does not fall below the 
guaranteed standard than that it exceeds such standard. In such circumstances we may 
be particularly interested in “ one-sided ” tests of the type f < fo, and as we have seen, 
there more often occur U.M.P. tests for this class of alternative than in the case when f 
can have any value. We might, therefore, be quite ready to accept such a test, knowing 
quite well that it may be insensitive in part of the range of the unknown parameter, merely 
because errors in that range are relatively unimportant. 

Similarly we might be willing to accept a test which had a poor discriminatory power 
in part of the range but compensating advantages elsewhere, simply because we know 
beforehand that values of the parameter rarely or never fall into that particular part of 
the range. This is equivalent to prior knowledge of the distribution of the values 
determining the alternative hypotheses. 

26.48. It is difficult to reduce rather vague prior knowledge of a parameter to numeri- 
cal form, and hence to extend our theory with great precision to cover these cases ; but in 
practice it is desirable to consider, before adopting a test, whether any prior knowledge is 
available, or whether our interests centre on particular parts of the range. If they do, we 
may consider the behaviour of power functions of the possible tests at our disposal and 
examine which is the more powerful test in the particular part of the range which interests 
us most. The mere fact that the theory developed in this and the succeeding chapter 
makes no assumptions about the prior probabilities of admissible alternatives does not 
mean that we should be acting sensibly in ignoring any prior information which may be 
at hand when applying the theory, or that we need feel compelled to apply tests with 
optimum properties in regions where we know the unknown parameter- values will not fall. 


NOTES AND REFERENCES 

The theory of this chapter is very largely due to Neyman and E. S. Pearson, whose 
treatment has been closely followed. In their first contribution to the subject (1928) the 
likelihood criterion was developed, the theory of first and second kind of errors and power 
of tests being given in 1933. For the theory of unbiassed tests, see the papers of 1936 and 
1938. In the last few years the literature has grown considerably. 

Feller (1938) has shown that similar regions only exist in rather exceptional circum- 
stances and that the theory of composite hypotheses is incomplete. Tables of certain 
power functions and distributions associated with likelihood tests are given by Mahalanobis 
(1933), Neyman and Tokarska (19366), Wilks and Thompson (1937a), P. C. Tang (1938), 
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David (1939), Nayer (1936), and in TaUea for Statisticians, Part II (Tables 36-37), See 
also Mahalanobis (1933). 

For tests based on the likelihood ratio, see Neyaian and Pearson (1928, 1931a, 19316), 
Pearson and Wilks (19336), Wilks (193da), Nayer (1936), Welch (1936a), R. W. Jackson 
(1936), Sukhatme (19366), Bartlett (1937c), Wilks and Thompson (1937a), Wilks (1938a), 
Bishop (1939), G. W. Brown (1939), Mood (1939), Hartley (1940), Wald and Brookner 
(19416). 

For the general theory, see also Welch (1936), Kolodzieczyk (1936), Neyman (19366, 
19376, 19386), Daly (1940), Pitman (19396), Wald (1939o, 1941a), Wolfowitz (1942), E. S. 
Pearson (1941, 1942a), Dantzig (1940), P. L. Hsu (19416), Simaika (1941), MacStewart 
(1941), Scheff6 (1942a, 1943). 


EXERCISES 

26.1. Examine the following argument : To accept H when it is false is equivalent 
to rejecting not-H when not-H is true. Hence, if X = not-H, to commit an error of the 
second kind for H is to commit an error of the first kind for K ; and thus there is 
no distinction between the first and second kinds of error. 


26.2. For the distribution 

dF = /? dx, x>y 

— 0 X <y 

show that for a hypothesis that /? = /S,, y = y, and an alternative that /S = 
y = yj, the best critical region is the region W, where po = 0, together with the region 
defined by 

* - >'A - iiogt + 

provided that the admissible hypothesis is restricted by the conditions < yo, Pi > 
Hence show that a U.M.P. test exists in such circumstances. 

(Neyman and Pearson, 1936a. This shows that a U.M.P. test can exist for more than one unknown 
parameter.) 


26.3. If the distribution function of ajj . . . is given by 


dF = ~ 


= exp \-:^i(X^}-ny) dx„ 

o“(23r)a ^ '1=1 f 1=2 J 


y, a > 0, — 00 <«! ...»„< 00 

show that the frequency function may be put in the form 

/ oc .IP ( - ) exp ( - ; 

and hence that x is a “ shared ” estimator sufficient for y and or. Show further that the 
best critical regions for y*, Oo diflFer according as o* > <t§, o* < ag or a = a*, and that 
their boundaries depend on y. Hence no U.M.P. test exists for admissible alternatives 
a> 0. 


(Neyman and Pearson, 1936a.) 
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26.4. In the previous exercise put a —y and consider the cla^ of hypothesis y > 0. 
Show that there are different best critical regions according as y > y*, y •< y« and that 
their boundaries depend on y. Hence there is no U.M.P. test, but £ is sufficient for y. 

(Xeyman and Pearson, 1936a.) 

26.5. In samples from a normal population, show that the probability of accepting 
the hypoth^is that the mean /* < /t, when, in fact, it is false and fi — ft fi ^ — that is, 
the probability of an error of the second kind — ^is 

/ny 1 r“ 

where 

a 

X ~~ 14 

and t is the value of corresponding to the significance level 1 a for the control 

s 

of errors of the first kind. 

(Neyman and Tokarska, 19366.) 


1 ;”“^ exp ( — — \ ~ ^ — f du dv 

V(27r)J^«, 




26.6. 


In six samples of six members each the following values were obtained — • 


Sample. 

Mean. 


1 

8433 

' 24,722 

2 

8200 

94,133 

3 

7933 

149,733 

4 

8120 1 

46,037 

5 

7971 i 

! 88,480 

6 

8263 j 

1 

1 49,921 

1 


with al = 104,688, 8l == 75,338. 

1 1 

Show that = 0*8608 and — 0-6219. The 6-per-cent, levels are respectively 
0-67 and 0-64, so that there is no evidence of heterogeneity. 

(Pearson, appendix to papers by Wilsdon, 1934). 

26.7. Verify that the likelihood ratio leads to “ Student’s ” test for an unknown 
mean in normal samples, to the use of Fisher’s z in testing the equality of two variances, 
and to the t-test for the difference of two means in normal populations with the same 
variance. 


26.8. If samples Uj ... % are drawn from the populations 
dF = — exp ^ ^ ^ i = 1 . . . 


use the likelihood ratio to test the hypothesis //. that the populations are identical, 
showing that ^ 




u. 


n (Xi — XnP 

f-i 

(£o — «.i)^ 


say, 

C’ 


A.S. — VOL. n. 


X 
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whete ^ is the mean of the »th sample, x^i is the smallest member of that sample, Xo is the 
mean of all samples together and X i is the smalleat value in all samples together. 

Show that the distribution of x^ and is 


and hence the moments of Xt« are 

Nf‘r{N 




ft> - 


/’(if+J.-l),-. I 


If J?i is the hypothesis that the populations have the same a but any possible dififerent 
show that 

TN-X 

where I is the weighted mean of the Z’s, and that 




. If Ht is the hypothesis that the populations, being known to have identical o’s, have 
the same j9, show that the distribution of 

L, = Xah = -- 

IfX 


IS 


^ TN-k-i n — L 

(Sukbatme, 19366). 


26.9. In the notation of 26.36 show that, if H is true, the criteria and are 
distributed independently. 


(Neyman and Pearson, 19316). 



CHAPTER 27 


GENERAL THEORY OF SIGNIFICANCE-TESTS— (2) 


Bias in Statistical Tests 

27.1. In considering the problem of estimation by confidence intervals in Chapter 19 
we had occasion to remark on the rather arbitrary nature of determining the interval so 
that both inequalities 0i < fl and fl < Oj had an equal chance Ja of fulfilment. A point 
of a similar nature arises in the testing of hypotheses, particularly when an asymmetrical 
sampling distribution for the criterion is concerned. Consider, for instance, the testing 
of the hypothesis that in a normal sample of n members the standard deviation or has an 
assigned value irrespective of the mean /^. As we have seen in Example 26.3, there is 
no U.M.P. test for all a > 0, though there is one for a'> and another for a < <Jo. In 
choosing a test to cover the whole range a > 0 we have, therefore, a certain freedom of 
choice, since there exists no best ” test as we have previously defined the term. A 
common test in practical use is to take the sample variance and accept the hypothesis 
a = (To if and only if 

si < si (27.1) 

where and 52 are determined from the distribution of 5®, namely 


dF ocs»-*exp^-|^)d(s*), 

such that 


(27.2) 



dF = i (1 ~ a). 


(27.3) 


In short, 5f and si are chosen so as to cut off equal tail ” areas of the distribution. This 
procedure will, of course, control errors of the first kind ; but so equally well would the 
selection of si and si so that 

r dF = i (27.4) 

Jo 

and f dF = i - a*, (27.5) 

provided that ai + aj = a. Thus we have an infinite number of regions which will control 
errors of the first kind. It is natural to seek for some criterion which will distinguish one 
as better than the others, recognizing that no U.M.P. test exists. 


27.2. Such a criterion arises naturally from the following consideration. In the 
example given, with ai = aa == Ja, let us calculate the power of the test for different values 
of a. This can readily be done from the distributions of type (27.2) by means of the incom- 
plete F-function or the equivalent integral. For any given <t we have to find 

|8(a?.si|a) = p+f“rff, 

Jo J.! 
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. (27.6) 
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where 



. (27.7) 


Fig. 27.1, adapted fix>in Neyman and Pearaon (1936), shows the relation between 
the power function j8 and a* for aj = a, = 0-49, » = 3, the rejection level being 0*02. 



in Sampled Population, (in units cf al). 

Fig. 27.1. — ^Power Curve in Samples of 3 for a* from a Normal Population (see text). 


We see that for a > 1 ~ cTo the power increases, and so also for or < J = ^oto. But 
between Joto and <To the power is less than 0*02, i.e. less than 1 — a. Hence for such values 
the chance of an error of the second kind, namely, the acceptance of a false hypothesis, 
would be greater than the chance of an error of the first kind, namely, the rejection of 
a true hjrpothesis. 


27.3. Whether this is felt to be anomalous depends on the relative importance of 
the two kinds of error in particular cases ; but, other things being equal, it may be felt 
more important to avoid the second kind than the first, and not to have a greater probability 
of accepting the hypothesis when it is false than of rejecting it when it is true. This, at any 
rate, is the basis of the criterion which we proceed to discuss, namely, that the critical region 
w should be chosen so that P (£ « u;) is a minimum when the hypothesis tested is true. 

Consider then the case when Ho ascribes to a parameter B the value Bo, and the admis- 
sible alternatives ascribe other values to B but do not differ from Hq in other respects. We 
shall say that w is an unbiaaaed critical region if, and only if, 

f H (J5 w 1 0o) == 1 — «» • • • (27.8) 

Jw 

and for any other 0, say B\ 

f p (0') dx ^ P (E e w \ B') > I — a. 

Jw 


. (27.9) 
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Equation (27.8) expresses the usual control of errors of the first kind and (27.9) the mini- 
mising property of w. If a region is not unbiassed it will be said to be biassed. 

A, In certain cases there wdll exist among the unbiassed regions a m>o such that 

f p (O') dx>\ p (O') dx (27.10) 

J iHo . J to 

for all admissible d\ Such a region may be called the best unbiassed critical region and 
the test based on it the uniformly most powerful unbiassed test, or briefly the U.M.P.U. 
test. It minimises the risk of errors of the second kind among the class of unbiassed tests. 
As we shall see presently, U.M.P.U. tests do in fact exist in certain cases. 

The use of the word “ unbiassed in this connection is rather arbitrary and is not to 
be interpreted as meaning that biassed tests will give systematically wrong results, or that 
unbiassed tests are based on unbiassed estimators. Fortunately the different uses of 
the term “ bias ” usually occur in different contexts and confusion is infrequent. 


Unbiassed Regions of Type A f U C 

27.5. Following Neyman and Pearson, we now define an unbiassed critical region 
of Type A as one for which 


and 



. (27.11) 
. (27.12) 

. (27.13) 


We shall, as usual, assume that the differential coefficients exist and shall also assume that 
differentiation may be carried out under the integral sign, so that we have for all w, 

= j If = j P' • • • (27.14) 


and similarly for the second differential coefficient which we denote by p\ 

The first condition (27.11) controls errors of the first kind; the second makes the 
region w locally unbiassed ; the third, (27.13), implies that as 0 departs from Oq the power 
function increases more rapidly than for any other unbiassed critical region of the same 
size. Thus in the neighbourhood of Oq the test may be said to be better than others of the 
unbiassed type. It may not be better for larger values of | 0 — 0o |» but the Type A tests 
are based on the supposition that it is more important to detect small errors of t he second 
kind than to minimise the risk of large errors^ whinh will prob ably be detected in any case. 


27.6. The regions of Type A may be found by the use of the following theorem : 
the region Wq is an unbiassed critical region of Type A if, within Wq, 

p" (0o) > (0o) +k,p (0o) (27.15) 

and outside 

p" (0o) < *iP' (0o) + (0o), .... (27.16) 


where p' (0o) ~ 

and ifci, jfea are chosen so as to satisfy (27.12) and (27.13). 
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Suppose that Fo . . . F^ are functions of and that 

I Ffdx = Cj, a constant. .... (27.17) 

Jw 

Let w, be a region such that inside it 

m 

F,>'^kiFf (27.18) 

and outside it 

F^<SkiFf (27.19) 

where the ib’s are constants chosen so as to satisfy (27.17). Then for any w for which 
(27.17) is valid 

[ F^dx F^dx. . . . (27.20) 

Jw JWt 

In fact, let be the common part, if any, of w and Wg* As both w and f^tisfy (27.17), 
we have 

f Ffdx=^[ Ffdx (27.21) 

J w—unvt J u>o-*t0uit 

Now f Fo <iar — f Fo d* = f Fo de — j* F^dx 

Jw Jw^-~wWt Jw^wwt 

> f Z {kf Fj) dx — { Z (kf Ff) dx 

J lilt— J m— UHI7, 

> 0 , 

in virtue of (27.21). 

In our present case take Fg as p" (flg) and Fj, F, as p' (Og), p (Og) respectively. Then 
(27.20) is true, and hence (27.13) is satisfied if (27.18) and (27.19) are true ; and these will 
be found to reduce to conditions (27.16) and (27.16). The theorem follows. 

! 

27.7. If (27.14) holds, and if there exists a sufficient estimator t for B, then the 
Type A region is bounded by surfaces of constant t. For then we have 

P (®) = Pi (<. 0) P» («) (27.22) 

and hence, from (27.15), on substitution, 

Pi (^» ®o) ^ Pi (^» ®«) Pi (^» ®o) 

within Wg, and conversely outside it. The equality must hold on the boundary, which 
is equivalent to the theorem. 


27.8. Writing 

^ = 



we have 

p'(0,) = ^p(0.) 

P"((9g)=(f -H^*)p(fl.) 
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and hence the inequality (27.16) reduces to 

f > jfc, ^ (27.26) 

within Wo. wherever p (do) does not vanish ; and conversely outside Wq. 

We may distinguish three special cases : — 

(o) K is a function of <f), say F (<f>), we have — 

F(i>) +<f>^>k,<f> + ko, (27.26) 

and the Tjqie A region is bounded by the surfaces 

= Cj and j = I ... m, . . . . (27.27) 

where m is the number of roots of (27.26). In this case, as we saw in 17.30, there exists 
a sufficient estimator. It follows that Wo is defined by inequalities of the type 

Cl < 0 < Cj, 

and we may, as in 26.24, use the ^’s as new co-ordinates and calculate the size of a region 
from their distribution functions. 

(6) As a simple case of (a), if 

<!>' 4 B<l> (27.28) 

we find, for (27.26), 

<f,» - ko <f> - kt = 0, (27.29) 

and the limits of ^ are given by the two roots of this quadratic. 

(c) If cannot be expressed as a function of <f> which does not involve the x’a explicitly, 
we shall have 

<f>' > ko + ki <l> ~ (27.30) 

In this case, considering <f> and <f>' as two co-ordinates of a point in a plane, we see that 
the region for which (27.30) is true is the one “ above ” the parabola <f>' = kt + ki <f> — 
and that kt, kt are determined by 

f d^{ p (4, = I - OL . . . . (27.31) 

J — OO J 4 ' 

r <f>d<f>r p {<f>, <i>') d>f>' ^ 0 (27.32) 

J -00 J*' 

In this instance we can reduce the problem to two dimensions by using two new co-ordinates 

4>. 


Example 27.1 

Consider the normal distribution 

dF = — ~ exp {- i (a: ~ p)^ } dx. 

To apply the foregoing theory with complete rigour we have to show that (27.14) is true. 
We siMiiTnA that this is so, referring the reader for a formal proof to Neyman and 
Pearson (1936). 

We have, then, with B = p, 

logi» (iw) = — i » log (25t) —\E(x — p)* 

<i>=E(x — po), (f>' = —n, 
and hence this case reduces to that of (27.28). We write 

if, = nix - Po), 
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and can clearly use x instead of ^ as a co-ordinate, which confirms the result of 27.7 since 
X is sufficient for fx. 

It follows that the unbiassed region of Type A is given by 

^ X <Xx^ X> Xt 


where 


I p ( J) (te = a 

Jtx 

I p{x) {x — fi)dx 0. 


Now if Ho is true, that is if /i = //q) ^ is distributed in the form 




Hence Xi = — x^ and the Type A region is defined as being outside the range 


I ^ 

y- ^ X ^ /io H 7 “ 

yn yn 


where A is given by 




In this case the Type A test leads to the usual test based on equal tail areas. The 
same test follows from the likelihood ratio, as the reader can verify for himself. 

Example 27.2 

If the distribution is normal with zero mean and variance and Ho is that a = Oo, 
we find 


^ = 5 1 ^ 1 = ;r 

<^0 J or© 


This also satisfies (27.28), and the Type A region will be defined by 


V 2 <v ^ -sE x^y or V < Vi, 

where I p (v) dv = a 

J Vi 

and I p (v) (v — n)dv = 0. 

Jr, 

Here p (v), the frequency function of the second moment, is 

and we find, for the second equation, 

f dv — n{ e""*® dv = 0. 

J V, Jr, 

Integrating the first member by parts, v being one part, we are left wdth 

=0 


2t;*" e~ 


**T =0 
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This has to be solved in conjunction with 

rr, 1 

I 1 ^ s dv = a. 

The numerical solution can be carried out by successive approximation or graphically. 

In this connection Pig. 27.2 is of interest. It shows, for samples of two and a = 0*98, 
the graphs of the power function for the ordinary test with equal tail areas, in addition to 
the power functions for the Type A test, the U.M.P. test with or > ao and the U.M.P. test 
with a < Co. 

Evidently, for a> the best critical region (2) has the greatest power (as it must 
have), and for cr < Co the best region (1) has the greatest power. The test based on equal 



Fio. 27.2. — ^Power Curves of Four Different Tests of the Varianci> in Normal Samples of 2 (see text). 


tail areas has a greater power than the Type A test for c > Co hut a lower power for a < Co* 
besides being biassed, as we have seen. 

As n becomes larger the same effects persist, but the Type A and the “ equal tails ” 
tests become closer together in power. For samples of 20 or more there seems to be no 
serious loss in using the latter since the range of bias and its magnitude are then very small. 
If, of course, we knew in practice that c >* c© we should use the U.M.P. test, and cases may 
arise, even when such knowledge is lacking, where “ one-sided ” hypotheses of this kind 
are all that concern us. 

Invariance Theorem for Type A Rejgions 

27 . 9 . It is important to show that the regions selected on the basis of Type A criteria 
conform to corresponding criteria if some other function C (0) is used instead of 0 itself. 
In Example 27.2, for instance, where we took 0 to be the standard deviation a, should we 
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have obtained the same legions if we had tahea 9 to be the variance o* ? The answer is 
affirmative under certain general conditions, as we should expect from the relationship 
with sufficient estimators. 

Suppose we have a new parameter given by 

0=0. +/(C)=V(C) (27.33) 

where /(O) = 0. Then if p {p) satisfies (27.14) and the similar equation in second differen- 
tials, if ^ is monotonically increasing and J > 0, th^n the region based on ^ is an 

unbiassed critical region if that based on 9 is so. It is sufficient to show that (27.15) 
and (27.16) are satisfied for Now 

0=V(C). V(O)==0.. = 

Thus 

Pt 1 0 .) = Pc (^\y> ( 0 ) ) 

= Pe{E\ 0o) v', 

and Pi (E 1 v» (0) ) = pi (E \ 0,) +pAE\ 0*) v". 

Solving these for pi and pi and substituting in (27.15) and (27.16), we find 

p" (E I V (0) ) > Pc (E I V (0) ) + kf’ p^{E\v(0)) . . . (27.35) 

within w and the contrary outside, where 

,, klip’* 4- p" 

The result follows. 


[a-' 


(say). (27.34) 


jfcj = k, y)'*. 


(27.36) 


Regions of Type Ai 

27.10. The regions of T 3 rpe A are determined so that tests based on them are 
U.M.P.U. in the neighbourhood of 0o. We now consider a region, said to be of Tjpe Ai, 
which is U.M.P.U. everywhere, i.e. which obeys (27.11) and (27.12) but has, in place of 
(27.13), 

{ pdx^ { pdx . . . . (27.37) 

3 Wg JtO 

for every admissible 0 and every w satisfying the other two conditions. 

It is conceivable that (27.37) does not entail the existence of a U.M.P.U. test, for there 

might be an unbiassed region of size 1 — a for which the derivative of J p da; did not exist 

at 0 = 0Q but which nevertheless gave a more powerful test. This refinement, however, 
need not detain ns. 

27.11. If represents the sample-space where the density is not zero, if 

and if ^ (0o) does not vanish identically in W+ then the unbiassed critical region of Type A 
is necessariiy of Type Ai. 

Let Wo be the Type A r^^n, which is determined ea: hypoihen by two numbers Ct 
and Ct, such that — 


Cl < ^0 < c* outside Wo. 
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We have to show that 


I pdx>\ pdx 

for all admissible 0 and any w for which 

I pdx =^\ — oi, 

JtO 


with the consequence that 


I y da? = 0. . 

J to 


. (27.38) 
. (27.39) 


Since — A + B<l> we have, solving this equation as a linear differential equation 
of the first degree, 

^ =||Aexp^- + . .(27.40) 


The reader may verify that this is a solution, and since it contains the arbitrary constant 
T it is the most general solution. It follows that wo may write 

log^) =P(fl) + +/(a:), say, .... (27.41) 

where P and Q do not depend upon x. We then have — primes denoting differentiation with 
respect to 6 and the suffix 0 relating to 0® — 

+ (27.42) 

We note that Qq cannot be zero, for if it were we should have 


^ = J <l>oPodx =Pl^jpodx Po, 

which would imply that <^o was identically zero. 

In virtue of the lemma of 27,6, the proposition will be proved if we can show that 
for fixed 0 and 0o there are two numbers a and 6, depending on 0 and 0© but not on the 
a;’s, such that 

> Po (a<^o f b) inside • • • • (27.43) 

and the contrary outside Wq. Putting the values of p and in this expression, we have 
to show that a and b can be found such that, inside Wq, 

exp{P(0) + TQ (fl) +f(x) } > exp{P(0,) + TQ (0.) +/(*)} {oP„ + aTQo + 6} 
or, writing r = P (d) — P (6.), i — Q {^) — Q (®c). such that 

exp (r + qT) > aQ'^T + aP,', + f> 

>aiT + bi, say (27.44) 

Here q cannot be zero, for if it were Q (0) would be equal to Q (0.) and, integrating the 
frequency functions over W, we should find r — 0. The alternative h3q>otheBis would 
not then differ essentially from Ho. 

Consider at the outset the case when and c, are different. From (27.42) we see 
that <^o depends only on T so far as variation in a: is concerned, and that 


if tf>o — Cl 

P = q/*- - Pi (say) 

Vo 

. (27.46) 

if tf>o = Cl 

P-®*;:^*=P, (say). 

Vo 

. (27.46) 
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Ti and T, are different. Choose and 6i so as to satisfy 


a^Tt+bt 

ax Tx+bx = «••+«»’•/• 


(27.47) 


Then (27.44) is satisfied at the boundary points and we have merely to prove that 


®i < 00 < Co implies e’’"*'*®’ < Oi T -f- 6i 
0« < Cl and 00 > Co imply > Oi T + 6i 
This follows from the fact that 

y = — Oi T— 6 i 

has only one miniTwnm , between Ti and To, as may be seen by differentiating it twice, for 
the second derivative is positive and hence the first is a monotonically increasing function. 
But y vanishes at Ti and To and hence is n^ative between those values and positive 
outside them. 

Finally, if Ci and Co are equal, say to c, we choose Ux and 6i so as to satisfy 




. (27.48) 


Pq -f- Qo Po — c'l 

ge'+«»'* -a, = oy (27.49) 

e'+o^r. -axT»-bx = oj 

It will be found that y has a minimum sA T — and vanishes there. It follows that in 
the r^on Wo complementary to tvo, where do — c, we have 

tr+«^ =. Ui T + bx, 

and thus in Wo where 0o < c or c < 0o the left-hand side must be less than the right- 
hand side. The demonstration is complete. 


Example 27.3 

Consider again the data of Example 27.2. We have already seen that for this dis- 
tribution 0' = A0 -f B, so that the regions of Type A are also of Type Ai. Among 
unbiassed tests of the hypothesis this is the uniformly most powerful test. 

Compoaite Hypotheaea : Begiona of Type B 

27.12. We now consider the extension of the foregoing results to the case when 
Ho is composite. For simplicity we will suppose that there are two parameters di and d,, 
Hp specifying di as say di, and leaving B, undetermined. Then a region will be said 
to be of Type B if 

(®) 1 P (®io> d*) (fo: = 1 — « for all admissible d* ; . . (27.60) 

(6) I p {Bx, d,) dx may be differentiated twice with respect to d, under the integral 
J We 
sign; 

r ^ f P **^1 = 0. . ' (27.61) 

LaOxjv), 

(d) For any other region to satisfying (27.60), 


. (27.62) 
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These conditions are obvious generalisations of those defining Type A. Patting now 

^y = g^logp . . . .(27.63) 

= <l>k}, k = l, 2 (27.64) 

we state that the Type B region will exist and may be found if and ^2 s-r© algebraically 
independent, if 

<f>ii — Aq + Ai (f^i + Ai ^ 2 ! 

4^12 = Bo + B, <f>^ + <f,A . . . .(27.56) 

(f>22 = Co + C^2 ^ 2 ] 

and if the law of distribution of is uniquely determined by its moments. We omit the 
proof of this theorem, for which see Neyman (19356). 

Simple Hypotheses with Two Parameters ; Rejgions of Type C 

27.13. The extension of the foregoing theory to the case of a simple hypothesis 
specifying several parameters presents some new features. Again to simplify the discussion 
we shall consider two parameters, 0 ^ and 02- 

Consider the power function in the neighbourhood of Oj == 02 = 0 which we will suppose 
to be the values specified by Ho- Writing for the function 



0, 1 w) = 1 p (Ox. Oi)dx . 

J W 

• 

. (27.66) 

1 1 

1 1 

--Pi, ;-i, 2 

• 

. (27.57) 

■ ■ 

= Pjky jy -= 1, 2 . 


. (27.68) 

_a0, ddk. 


we have, assuming an expansion by Taylor’s theorem. 



^ ( 01 , 02 1 1 ^^) - (0, 0 1 w) f. 0i /3i {w) + 02 P 2 M 

i {0? Pll (^) (^) ^28 (^) } + • • • 


. (27.59) 

To extend the idea of unbiassed tests to such a case we require in the first place 



|8x(w) -01 

/32 (W) = 0 J ' 

• 

. (27.60) 

Secondly, there will be a minimum at 0i = O 2 = 0 if 



A 

— ^12 Pll P 22 0 . 

• 

. (27.61) 

and 

Piu P 22 >* 0. . 

. 

. (27.62) 


K these conditions are satisfied the power function for small values of 0i and 0, is effectively 
P (01, Oa 1 w;) = 1 — a + J {0f pii + 201 02 + 02 ^22} • • (27.63) 

We may represent this diagrammatically as in Fig. 27.3, which shows one of the ellipses 
for which the power function is constant. 

Since the hypothesis Ho is that 0^ = 0, = 0, we may speak of the value 0i as the “ error 
in 01 ”, and similarly for 0a ; and if, as in the case depicted, the co-ordinate axes are not 
the same as the principal axes of the ellipse it is clear that for values of 0i which are not 
zero, errors of positive and negative sign in 08 are not equal. From this viewpoint it may 
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be said that the minimisation of the power fonetion does not control positive or negative 
errors to the same extent ; for the points A and B m Fig. 27.3 lie on the ellipse of oonstsmt 



Fig. 27.3. — ^Ellipse of Constant Power for Simple H}fpotheBis with Two Parameters (see text). 

/J, SO that the probability of detecting them is the same, though A represents a positive 
“ error ” in 6a greater than the negative “ error given by B. 

27.14. Whether this is a desirable property of the test depends to some extent on 
what the test is intended to do. To avoid the anomaly we must require that 

/Sia == 0 (27.64) 

Furthermore, even if this condition is satisfied and the principal axes of the ellipse coincide 
with the co-ordinate axes, there may still appear anomalies if the length of one axis is greater 
than that of the other ; for then errors in one parameter are not detected as frequently 
as errors of the same size in the other. Here again it is a matter of particular circumstance 
whether such an effect is regarded as objectionable. (We disregard the fact that it can 
be removed by appropriate scaling of the parameters, which may or may not be artificial.) 
To remove it we must require that 

(27.65) 

so that the ellipses reduce to circles. 

We may refer to the ellipses as “ curves of equidetectability.** 

27.15. With the foregoing explanation in mind we define as a regvJar unbiassed 
critical region of Type C if it obeys the conditions 

Pi{Wo) = Pt(Wo) ^ 0 (27.66) 

Pit K) = 0 (27.67) 

Pii (^’o) ^ Ptt (w^o) ..... (27.68) 
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and if, for any other region obeying these three conditions and for which 



jS (0, 0 1 w;o) = jS (0, 0 1 u;) = 1 - a, . . . 

. (27.69) 

we have 


fill (^ 0 ) > ^11 (t^). .... 

. (27.70) 

Secondly, if 1 

a region Wi possesses the property that 



Pi (w?i) = Pz (w?x) =0 

. (27.71) 


P 12 (^ 1 ) Pii (^ 1 ) P 22 (^ 1 ) 0 

. (27.72) 

and for any other region obeying the conditions 


\ 

/9 (0, 0 1 = /? (0, 0 1 «;) = 1 - a 

. (27.73) 


10 X 1 (Wi) _ /?x. (Wx) _ Pu (Wx) 

Pii (^) Piz (^) P 22 (w>) 

. (27.74) 

we have 

Pll («^l) > Pli (w) ... . 

. (27.76) 

we shall say that 

Wi is a non-regular unbiassed critical region of Type C. 



These equations are analytical ways of saying that the regular region of Type C is 
the one, among all regions having circular curves of equidetectability, which has the smallest 
radius for any given value of the power function ; whereas the non-regular region of Type C 
is the one, among all regions having similar ellipses of equidetectability, which has the 
smallest axes. 


27. 16. We now state without proof theorems similar to those demonstrated above 
for the case of a single parameter. 

Write 



Then Wf^ is a regular unbiassed critical region of Type C if 

(а) inside 

Pl\ > {Pll — Ptt) + *2 Pl2 + *3 Pi + *4 P2 + h Py 
and outside Wq the inequality is reversed — 

( б ) I pjdx = \ pu dx ^ f (pii — Ptz) dx = 0 , j ^ 1 , 2 , 

J fCo J ^0 J ^0 


(27.76) 

(27.77) 


Secondly, if Wi satisfies the conditions — 

(а) that inside Wx 

Pxx > kx (yi* pn — yu Piz) + ki (yjj Pxi — yii P»«) + Pi + k^ p» + p {21JS) 
and outside w, the inequality is reversed, the k’s as usual being constants and the y’s obeying 
the conditions 

yix > 0, y?2 - yxi ys* < 0 ; 

( б ) f Pi f (yw I’ll ~ yii i’ll) = f (y** ~ yn p **) ^ ~ (27.79) 

Jw. J«1 

then M 7 i is a non-regular unbiassed critical region of Ty^ C, having ellipses of equidetecta- 
bility determined by 

yxi + 2yi, 01 0s -t- y 22 02 = constant. 


. (27.80) 
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21. VJ. The theorem of invariance of 27.9 no longer holds in general for the present 
case. If we transform to new parameters and the equations of transformation 

OUx w\/2 

etc. will not transform an ellipse co-axial with the co-ordinate axes Oi, 0g into one co-axial 
with Cif Cv Thus, in general, the effect of a transformation is to make a regular Type C 
region into a non-regular Type C region. 


27.18. As usual, the conditions for the T3T)e C region may be simply written in terms 
of the derivatives of logp. Write 


Then if 
we shall have 


_ [ 9*\ogp ~\ 


= -^ik + -S/fc ^1 + •!>» 


. (27.81) 
. (27.82) 

. (27.83) 


P^k = (^y + -4/* + + ('fk ^*)P • • • (27.84) 

and the inequality (27.76) becomes 

(1 — fci) <l>i — hi <f>i -|- hi ^2 — h^ 4^1 — h^ <f>t — Ijj ^ 0 . . (27.85) 

where the k’ are new constants easily expressible in terms of the old. They must be deter- 
mined so as to satisfy (27.77), which reduce to 


<f>fpdx=\ (<^i + A„)p dx = [ {4>\ — 4>l + Mil — Att)}p dx = 0. (27.86) 

J tr« J W 9 


Example 27.4 

Suppose we have a sample of from a normal population with mean yi, and unit 
variance and a second sample of n, from a normal population with mean and also unit 
variance. The simple hypothesis to be tested is = p, — p^, where p^ is some specified 
value. We consider two cases : — 

(i) in which errors of the same size in pi and /i, are equally important ; 

(ii) in which, for some reason, there is a stronger desire to avoid errors in px than 
in Px and that therefore a greater number n. of members has been taken in the second 
sample. We also assume that the sizes of errors judged of equal importance are 
inversely proportional to ‘^n, so that we are led to consider new parameters — 

Vi — {pi~ /*«) Vt — {/*» — Po) • • • (27.87) 


Case 1. — ^The frequency function is 


p oc exp - (av - Px)* - i ^ (®y - Px)* . 

1 ni+1 J 

It will be found that 

<f>i = fii {xi po) 5 = Ut (x^ — Po) ; 

= Ally = 0 = Ax% \ = — ^2 == Aff. 
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From ( 27 . 86 ) we then find 

(1 — ki ) (*i — — kt Til (*x — fit) {xi — /io) 4- ki »2 (*» " j“o)® 

— kt »i {Xi — fit) — k'l », (x» — /«o) — ki > 0. . (27.88) 

The law of distribution of Xi and may be written 

p QC exp [ - i {»i (xi - /to)* + nt (x, - /i,)*} ]. . . (27.89) 

P«f' « == Vtii (Xi - /to) and = VtIj (Xs — /to). 

Then the region Wo is determined by 

(1 — ki) Til *** — ki uv ■\/(tIi n) + ki Tit — kt uy/rii — k\ v-y/na — fcj > 0 ( 27 . 90 ) 


where 


I p {u, v) du dv = I — OL 
JiDo 

I u p {u, v) du dv — \ V p (u, v) dti dv = I uv p (w, r) du dv =r‘ 0 . (27.91) 

J M’o JWt J Wt 

I (fii u^ — 712 v^) p (u. v) du dv ^ (\ — a) (Ui - n^) . . (27.92) 

J Wn 


. (27.92) 


^ (u, v) ^ ~ exp { - J (w^ -t- v^)}. 


It is evident from (27.90) that in the {u, v) plane the boundary of Wq is a conic. From 
(27.91) we see that it must be coaxial with the co-ordinate axes and have its centre at the 
origin. Hence Ajg — -- ^*1 — 0. Finally from (27.92) we find that the boundary is 

of the form 


where 


a* + 6”* ^ • 

1 Tlj (1 ^l) 1 ^’2 ^1 

ifcj ’ k’r. 


. (27.93) 


(27.94) 


The Type C regions are then defined by (27.93), but we have to express a and b in terms 
of known constants, including the probability level 1 a. Wc have to satisfv (27.92)* 
and will show that a solution always exists. 

Put 

F (a, fe) = ^ f (Til — ^2 V*) Gxp { — \(u^ + v^) } du dv — (Hi — tIj) (I — a). (27.96) 

J H\ 

If the boundary of Wf^ is a circle, its radius is easily found to be 

a = b ^ 2 log (1 — a)}. 

The integral F {a, b) outside this circle, by the substitution u ~ r cos y», r = r sin xp, is 
found to be 

F (a, a) = (»i — Ti) -- f M* exp { - J (m* + r*) }dudv — (tii — Ws) (I — «) 

2jl J 

(1 - - a) (tIi - Tig) Ja®. 

Now taking Wq as the space outside the parallel lines 

V == i A, 


2 f* a 

which is given by a infinite, so that 


dx = \ - a, 


A.S. — VOL. II. 
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F (oo, A) = - (n^ - »,) (1 - a) + 


«* exp {— J (it* + V*) } du dv 


Similarly, 


- ~ I t>* exp {— i («* 4- 1 )») 
^ -nt c-*^‘ < 0. 

F (A, oo) = «i /—A e"**‘ > 0. 


Thus, since F (a, 6) is continuous it must vanish somewhere in the range A < a < oo, 
A < 6 < 00 . The values for which it does so define the Type C region. 


Case 2. — In this case, using the parameters and rjt of (27.87), we find 

<f>l = Uf ^2 = v 

tf>ii = 1, = 0, ^aa = 1. 

The inequality becomes 

(1 — ki) — k^uv + kiV^ — k'^u — kl V — k'^ > 0 , 
where 1 (u^ — v^) p (u, v) du dv == 0. 

JWt 

In a similar way it follows that the Type C region is the one lying outside the circle 

^2 ^ ^2 ^ _ 2 log (1 — a). 

We leave the verification of this result to the reader. 


Certain Limiting Properties 

27.19. From the foregoing examples it will be seen that in certain cases the optimum 
critical regions are by no means easy to determine numerically ; and it is not always clear 
that the labour involved is repaid by the results. Some consideration has been given by 
various writers to tests which have optimum properties for large n, the presumption being 
that the same tests will be good, if not the best, for small values. As usual when several 
limiting processes are involved simultaneously, the rigorous enunciation and proof of 
theorems in this field is a matter of some complexity, and we shall here merely indicate 
some of the results in very general terms without including proofs. 

It has been shown by Neyman (19386) that there do exist tests which are unbiassed 
in the limit, and rules have been given for finding them. It has also been shown by Wald 
(1941a) that there exist tests which are most powerful in the limit, and that such as are 
based on maximum likelihood estimators are of this class. The tests are uniformly most 
powerful for the single parameter d > do and for 0 < Oq, but not both ; and for any range 
they are the most powerful unbiassed tests in the limit. Furthermore, the Type A test 
tends to the most powerful unbiassed form. 

The general conclusion seems to be that, even where the variation is not normal, most 
of the tests in current use which are based on likelihood estimators have optimum properties 
in the limit, and may therefore be used confidently for moderate or large samples. For 
small samples the position is not so clear, particularly for npn-normal variation. Tests 
based On inefficient estimators are presumably less satisfactory ; and for the non-para- 
metric case there is as yet no complete theory. On this latter question reference may be 
made to a useful review by Scheff4 (1943). 
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The Unbiassed Character of Lihelihood-ratio Tests 

27 . 20 . It is of some interest to consider how far the tests based on likelihood ( 26 . 35 ) 
are unbiassed. 

It has been shown (Pitman, 19396 ; Brown, 1939) that the Neyman-Pearson test in 
the problem of h samples based on is biassed unless all the samples are of the same size ; 
but that Bartlett’s modification ( 26 . 42 ) is unbiassed. We prove this in 27.25 below. 
On the other hand, Daly (1940) has shown that in certain multivariate tests such as those 
of regressions, multiple correlations. Hotelling’s T (which we introduce in the next chapter), 
and the ordinary analysis of variance and covariance for orthogonal or non-orthogonal 
data, the likelihood*ratio tests are unbiassed, at least in the Type A sense (i.e. locally) 
and in some cases completely so. 


Pitman's Method for Location and Scale Parameters 

27.21. In the special but not uncommon case where the hypotheses under test con- 
cern parameters of scale or location, a simplified approach is possible. Suppose the joint 
distribution of k sample-values is 

dP f (^1 ^1, ^'2 ®A:) • • (27.96) 

We seek for a statistic J, independent of the O’s, to test the hypothesis ; and clearly, if the 
test is to be satisfactory, J must be independent of the origin, i.e. must be seminvariant. 
The test that the fl’s are all equal is then equivalent to testing the hypothesis 

0, 0, = . . . ^ 0^. 0 (27.97) 

Without loss of generality we may suppose the hypothesis rejected if J is small and less 
than some quantity depending on the acceptance value a, and we may also suppose J 
positive ; for if either condition is not satisfied we can transfer to some other function of 
J for which it is. 

In the sample space W, J must be constant along the line — • • • -- = con- 

stant, and therefore the critical region will be the one lying outside a hypercylinder 
whose axis is parallel to this line. When //o is true, the probability of rejection is then 

f dF (Xi . . . o:^) = 1 - a, . . . . (27.98) 

Ju7o 

and when it is not true the probability is 



• - 0 *) 


where w is merely derived from w. by a translation in W without rotation, 
parallel to a:x = . . . = = 0, we write 


. (27.99) 

If L is any line 


P 



where r] = (x) ; 

and jj is thus the distance of the point {xt . , . Xh) from the plane £ {x) — 0. 


. (27.100) 
. (27.101) 
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Now if is defined 8» the locos of all lines for which P (L) > k, a constant, P (L) will 
be less than h on any L which is in w but not in w^. Hence 

, \ dF>[ dF (27.102) 

J to* J II? 

and so the resulting test is unbiassed. Thus an unbiassed test is given by choosing J so 
that at any point of a line L it is equal to P (L) at that point. Now we may write for the 
variable co-ordinate on a particular L, say f,., 


Sr 


where 

t==-E(x) 

* ' ^ y/k 

Hence 



P (L) = ^/k j* f {xi — t, Xt — t, . . . Xii — t) dt. 

Taking 

J==i-^P(L), 

we find 



e/ = ^ f {Xi — tf X 2 t, • . • X/^ t) dt, 

which gives us 

an unbiassed test. 


. (27.103) 


. (27.104) 


Example 27.S 

Consider the case where the variables are distributed normally with unit variance. 

/ - ~*r exp {- I i: {x^ - 0j)* }. 

(2;r)2' 

Then we have, from (27.104), 

J = — exp {—IE {Xf — 0* } dt 
(2jt)f 

where S = 2 (x — x)^. 


In practice we should take S as our criterion, not J, and reject the hypothesis that 
the means were unequal if S exceeded some fixed value determined by a. We observe 
that in fact 8 is distributed with A; — I degrees of freedom when Ho is true, so that 
this value is easily ascertained. 


27.22. Consider now the case where the frequency function is 

1 .(Xi Xk\ 

010. . . . 0/\0i ’ ■ ■ 0j‘ * 

If the a?’s are positive in range we put 

%=loga:^, log fly, 

and for the frequency function of the y’s we find 

exp(2:y - . . . e"*"**). 


. (27.105) 
. (27.106) 


. (27.107) 
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This reduces to our first cspse, and we have an unbiassed criterion that 


by putting 


<f>i — . = <f>ii 

J exp{2:y — kt) /{€>>>-*, 


dt 


. (27.108) 


When the a;’s are not necessarily positive the expression remains the same, except that in 
(27.108) n (x) becomes 7/(| x |). Small values of J are significant. 


in Suppose now that our hypothesis asserts the equality of fl’s or and 
states that they have a common value flo or ^o» O'S the case may be. Then if we take 




/(a^i 




(27.109) 


the test will be unbiassed. Moreover, if we rt^gard small values of J' as significant and the 
ic’s are independent, and if each frequency function is unimodal, then when 


f?, - 0, - . . . - 0;^ ^ Oo 

is not true the probability that J' exceeds the specified limit based on 1 - a increases as 
any 0 tends to 0o. «/' therefore provides an unbiassed test. 


27.24. Finally, consider the case of k variates each distributed in the form typified by 

1 / 


dF^ \ exp 

<f>j 1 (nij) 


(-:;)( 




dXj, 


dF 


Their joint distribution is 

Hence, to test the hypothesis that the samples have the same we have 

. n {r {m)}], 


. (27.110) 


. (27.111) 


where M — 2^ (m), 

’ ^ r{M) n{x«^) 

~ n {r(m)}' ■ 

It is sometimes convenient to deal with 

J1 (x«‘) 

“ (2’«yw 

which differs from J only by a constant factor. 

The maximum value of K is 

IJ (m™) 

and we put 

jr _ _ JogA =:Jlflog(^) -i;(mlog?-) 
iog max. K \ M J \ m/ 

L is essentially not uegative, and large values are significant. 


. (27.112) 
. (27.113) 


. (27.114) 
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For testing the hypothesis that a set of variances have some specified equal value, we 
find similarly from (27.109) 

L' = E {x) - M - E . . . (27.116) 

27 .25. The foregoing result has an immediate application to the case of k normal 
samples, for the variances are then distributed in the Type III form of equation (27.110). 
The criterion L becomes 

X =Nlog^^^) -2:(nlog0. . . (27.116) 

where v as usual represents the number of degrees of freedom and N = Z (v). This, as 
will be seen by comparison with (26.93), is equivalent to Bartlett’s test, and shows that 
it is unbiassed. 


NOTES AND REFERENCES 

For the theory of unbiassed tests see particularly Neyman and Pearson (1936 ; 1938) 
and Neyman (1935&). Regions of Type B have also been considered by SchefF6 (1942a), 
who discusses a Type Bi standing in relation to B as Type Ai to Type A. 

For limiting properties see Neyman (19386) and Wald (1941a). 

See also references to the previous chapter. 


EXERCISES 

27.1. Show that the test of Example 27.1 provides regions which are of Type Ai 
as well as of Type A, and that the test is a U.M.P.U. one. 


27.2. Show that the cumulants of the distribution of L of (27.114) are 

ifx = Jf {Oi (M) — log M} — £[m {Oi (m) — log m} ] 

- (- I)*- {ZfnTG^im) - }, r> 1 


where 

Hence show that the cumulants of 




1 +/» 


are approximately 


Kr 



r(r), where 


^ 6(jfe - l){‘^(w) ilf}’ 


and thuB that 


2L 

1 +/» 


is. distributed approximately as with i — 1 degrees of freedom. 

(Bartlett, 1937c ; Pitman, 19396.) 
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27.3. Show that in samples of 3 from a normal population the distribution of the 
range r is given by — 


dF 


6 


a^/n 


r»v6 

e 40* I 

Jo 


V{2jr) 


e~^ dy dr. 


Hence that an unbiassed critical region of Type A is given by 


r 



the region lying outside ri < r < r,. 


(Neyman and Pearson, 1936.) 
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MULTIVARIATE ANALYSIS 

28.1. We have already considered some aspects of the case in which each member 
of a population is characterised by several variates Xi . . . Xp. For instance, we have 
examined the measurement of correlation between the variates and the regression of one 
variate on some or all of the others. In this chapter we shall extend our inquiries into 
the multivariate case a good deal further, mainly by taking into account the possibility 
that different sample-members may have emanated from different populations. This 

. will lead to some generalisations of the methods already discussed for the univariate case, 
such as tests of homogeneity an^ tests of differences between two samples. Some of our, 
^ known results generalise with nothing more than additional mathematical complexity; 
but in others certain new features appear, and the theory of multivariate analysis is not 
entirely a matter of generalising univariate results to p dimensions. 

28.2. One or two examples will illustrate the kind of problem with which we are 
concerned. A number of skulls are discovered in a burial-ground. They are found to 
vary among themselves in the manner usual in biological material. Is the observed varia- 
tion consistent with the hypothesis that all the skulls were derived from members of the 
same race or does it suggest a mixture of racial types ? If heterogeneity is indicated, do 
the skulls fall into two well-defined categories, such as we might expect if the burial-ground 
were the site of a battle between two races such as Saxon and Celt ; or are there several 
types such as we should expect in the normal burial-ground of a town where races were 
living together and interbreeding ? Or again, if the skulls are compared with another set 
known to have been buried at a much earlier time from the same race, is there any evidence 
of a significant change in skulls from one period to the other ? 

There is no single measurement on a skull which is marked out from the infinite number 
of possible measurements for deciding questions of this kind. It is quite common for 
thirty or forty measurements to be taken by craniometricians on a single skull. Even if 
we reject many of these for practical reasons, leaving out the jawbone, for instance, because 
it is often separated from the skull and cannot be identified, we shall still be left with a 
number p which require consideration. For n skulls we shall then have n sets of p values 
corresponding to variates x^ ... Xp which are, in general, correlated among themselves 
and may be highly so. Our problem is to test the homogeneity of these values, or to esti- 
mate differences between parent populations from which they were derived. We may, 
of course, apply miethods which are already familiar by picking out one variate and testing 
for homogeneity. But we might pick out quite an unsuitable one and sacrifice most of the 
information. Even if time permits we cannot take each variate in turn and test it because 
the variates are correlated and our p tests are not independent. 

28.3. Again, suppose we have two different breeds of laying hen and are given a 
batch of eggs from the hen-run without knowing which hen laid which egg. We require 
to allocate the eggs to the two breeds. Assuming that there is no decisive criterion such 
as colour of shell, we may measure various properties of the eggs such as length, breadth, 
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weight, volume, specific gravity and so on. Some of these measurements will be highly 
correlated or, in the extreme case, perfectly correlated, as with weight, volume and specific 
gravity. In such circumstances we may reject some variates as redimdant ; but in general 
we shall be left with several sets of measurements. Our problem is to find some method 
based on the retained variates for allocating the eggs to the correct parent breed. In 
particular we might search for the best linear function of the variates to discriminate between 
breeds and to enable us to assign the eggs with the maximum probability of correctness. 

28.4. Throughout the whole chapter we shall, except when the contrary is stated, 
assume that the variation is normal. In addition, to render our formulae a little less 
cumbrous we shall borrow a summation convention from the tensor calculus. If the 
affixes i, j range from 1 to p we shall write 

( 28 - 1 ) 

1-1 5 1 

the affixes to A being regarded as ordinary superscripts, not as powers. Similarly we 
shall have 

(28-2) 

t -1 

Whenever an affix occurs as a superscript and a subscript, summation is to be understood. 
Clearly the actual letter used is a dummy and we have, for instance, 

^ A^^ - A^^ (28.3) 

We shall write the array of values A*^ (a square matrix) as (A*^) and its determinant 
as I I or simply as | ^ |. 

To every matrix (a,j) with a non-vanishing determinant there corresponds a reciprocal 
or inverse matrix which we may write (a*^). Since 

(tty) (a«) = 1, 

we have, on carrying out the multiplication, 

Oya'*--l, j = k 

^0, j 9^- k, 

which we may express as 

== — dj, (28.4) 

where one form of the Kronecker delta, is zero if j ^ k and unity otherwise. The quan- 
tity is the minor of afj in | .4 | divided by | ^ | itself. 

28.5. It will further simplify our formulae and will give rise to no loss of generality 
if we suppose our variates to be in standard measure, that is to say, to have zero mean 
and unit variance. If we require results for the more general case wc can easily obtain 
them from transformations of the type 

(28.6) 

With this convention the equation of the multivariate normal distribution (cf. 15.12, 
vol. I, p. 376) may be written 

dF = exp ( - Xi Xj) dx^ . . . dXp, 


. (28.6) 
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where the A’s are related to the correlation determinant 

A = \ptf\. . . . . . . (28.7) 

In fact (A^) is reciprocal to (p^j), as we saw in 15.12. 

28.6 . We shall also frequently refer to the matrix of sample variances and covariances 
which we shall call the dispersion mairix and write as where 

ati==- (Xf- Xf) {Xf -Xf). . . ‘ . . (28.8) 

lW-1 

This, it is to be remembered, is in standard measure for the population, that is to say the 
observed variates are taken from the parent means and divided by the parent standard 
deviations. 


WisJuirt's Distribution 

28.7. We now proceed to generalise to p variates the joint distribution of dispersions 
arrived at in 14.12 (vol. I, p. 339) for the bivariate case ; and we shall also show that 
the distribution is independent of that of means. The result and method of proof are 
due to Wishart (1928). 

First of all let us write the result for the bivariate case in our new notation. For 
the distribution of means we have 


dF = exp ^ ^ ^ ^^2 


i, j = 1» 2 


(28.9) 


and for that of dispersions 


= I A !«»-« 




duw d(Z\^ (28.10) 


For instance, we have 


~ ^ l > ®18 — ^ ®82 — 

/I \ 


(A«) = 


so that (28.10) is equivalent to 


dF = 




* —P 

1 - /)* 1 - p* 

-P 1 

\ - pi I - pi 


(1 - r*)^* 

(1 -p*)*(»-i) 


X exp I — (<^ - 2/9r«i «, + «|) I dsx ds, dr. 


This, with the substitution 


is the form found in equation (14.44), vol. I, p. 342, when it is remembered that we are 
working in standard measure. 
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WISHART’S DISTRIBUTION 

28.8. Now consider the general case. With a sample of n values of p variates we 
consider p rectangular spaces of n dimensions each as the domain of variation. If a point 
in one of these spaces be fixed, the variation in the other spaces is constrained for fixed 
values of the sample dispersions. The following argument is a generalisation of that given 
in 14*12 leading to the bivariate result, and the reader may like to refresh his memory 
by re-reading that section. 

Writing Xji . . . for the n values of the jth variate, we have for the density function 
of the whole sample, from (28.6), 

= exp [ - i (a:,* - x,) (aj* -*,)}] x exp ^ | ^ . (28.11) 

We may thus factorise the density function into two parts, 

“><* • ■ '(“.IS) 

where we have chosen the constant factor of ft so that the distribution shall have the total 
frequency unity. 

n 

Consider now the volume element 77 . . . dx^j^. In any particular n-space 

k^l 

the density is constant over hyperspheres centred at the mean. The volume element may 
then be represented as the product of elements dx^ and of independent elements depending 
on dispersions. In the total space of pn dimensions the volume element may thus be 
represented as the product of p elements dx^ and an independent element depending on 
dispersions. Thus the volume element also factorises, and we have immediately for the 
distribution of means 

dF = exp ( I ^ dxi, . . . (28.14) 

showing that the means are distributed in the multivariate normal form independently 
of dispersions. 

If we define a matrix (B) with elements Jn times those of (A), we may write the dis- 
tribution of means in the simple form 

dF = ■ exp ( — x^ Xj) Tldx (28.16) 

We note that this checks with the known results for p = 1 and p = 2 . It is also seen 
almost at once that the variance of x^ is cr|/n, as we expect. 

28.9. We have now to consider the more complicated expression for the volume 
element of dispersions. Let us in the first instance transfer our origins to the sample means, 
remembering that in doing so we have lost one dimension (or degree of freedom) in the 
variation of our sample-points. Let Pj . . . Pp be the sample-points whose co-ordinates 
are the n values of a?! . . . Xp, one point P lying in each w-space. -We shall consider in 
turn the variation of Pi, then that of Pa for fixed Pi, then that of P 3 for fixed Pi and Pa, 
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and so on. The total variation will be given by multiplying the various expressions so 
obtained ; and it will be sufficient if we consider the typical case of the variation of 
for w — 1 fixed points Pi . P^^j. 

For a fixed length OP^ and fixed angles with OPj . . . OP^.i, P^ can vary on a 
hypersphere of n — m dimensions ; for, if we fix any particular angle, P^ is constrained 
to lie on a hypercone which cuts its hypersphere of variation in a hypersphere of one fewer 
dimensions, and the fixation of the origin at the sample mean imposes a further constraint. 
Further, if we regard the p spaces as superposed, as we may, the centre of this {n — m)- 
dimensional hypersphere is the foot of the perpendicular from P^ on to the space containing 
the points, 0, Pi . . . P^^i* Call the length of this perpendicular for the time being 

The volume of a A;-dimensional hypersphere of radius r is 


and its surface area, obtained by differentiating with respect to r, is 

2 JT** r*^“* 

~Tm 

The surface area of the hypersphere of variation of P^, is thus 




. (28.16) 


. (28.17) 


To find the element of volume due to the variation of P„^ and the angles which 0P„^ 
makes with OPi . . . 0P^_x we have to multiply (28.17) by an element of variation 
normal to the hypersphere of n — m dimensions. This variation lies in the hyperplane 
determined by the origin and Pi . . . P^ which is, in fact, normal to the hypersphere. 
To evaluate it, consider the transformation 


m 

j = 1 . . . m, . . (28.18) 

where, of course, the a;’s are measured from the sample means in virtue of our choice of 
origin. We have for the Jacobian — 




• • ^mm) 




• • ^mm) 



Xii 

Xi2 . . 


sss 

Xi2 

X 22 • . 

• ^2m 



2a?2^ • • 

• 2a:„„ 

== 

• 

. 


. (28.19) 


where is the volume (or “ content ”) of the hyperparallelopiped having one corner at 
the origin and edges running to the i)oints Pj . P^. Furthermore, 


I fii.# I = I -2: % I 
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The required element is thus 


2u. 


m 


m fe— 1 


and the total element of variation of on multiplication by (28.17), is 

n dSmk- 




(28.21) 


Now is the length of the perpendicular from on to the space OPi . . . P^^i 
and is therefore equal to Hence, for the variation of P^ we have the element 

yri{n-m) m 

n d^,,^ (28.22) 




We now derive the total element for variation of Pi . . . P^ by multiplying expressions 
of type (28.22) for m = 1, 2, . . , p. The terms in v cancel except Vp and t’o, the latter 
being unity, and we find 


Now from (28.18) 


^Jp(2w -11-1) 




m 

n 




we have 


p 

77 


k-^ni 


(28.23) 


= » % (28.24) 

and from (28.20) v^=n^ \ a \ (28.25) 

Making the necessary substitutions in (28.23) and adjoining the frequency element given 
by (28.13) we find, after a little reduction. 


a I iin-p-2) 

p / n — lc\ ®xp ( -- - ttij ] n da, . (28.26) 
^iPiP~i) 2 y ^ ' 

This is Wishart’s generalisation of the distribution of dispersions in a multivariate 
normal system. The reader who feels that the foregoing proof demands too much of his 
powers of geometrical insight may refer to alternative derivations by Wishart and Bartlett 
(1933c) or P. L. Hsu (1939a). The domain of variation of the a’s is 0 to oo for and 
corresponding values for a,^, i such that correlations do not exceed unity in absolute 
value. 


dF = 


_0) 




A |i(n-l) 


28 . 10 . It must be remembered that wc are regarding as the same as and that 

the product of differential elements in (28.26) contains Ip (p 1) items, not p^ ; for there 

are p elements of the form da^^ and \p {p — 1) of the form d^^p i ^j. The expanded form 
of aij, however, takes place over i, j from 1 to p, so that any particular term such as 
a 84 occurs twice, once as A^^ and once as .4^® a 43 ; except that when f = j the term 
occurs once. For instance, with p = 2 we have 

A^^ a^j = A^^ a^^ + 2A^^ + A^'^ . . . (28.27) 

We can now derive the characteristic function of the Wishart distribution. Ignoring 
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constant factors and writing a single integral sign for summation over all a^, we have, 
from (28.26) — 

I a I *("->-») exp I A» af,^nda = ^^ . . (28.28) 

where IC is some constant. In this form let us replace by when i and 

2 

by A^^ — when i = j. Then the resulting integral is the characteristic function of 
the a’s, being the parameter corresponding to a^. We thus have 



^ ? 0“ — - e»* . . . — -V*» ; 


; _ iflia ^aa — - $»* . . . — -0-*' ' 

n n n ' 


(28.29) 


I ^ip _ i flip ^2p _ i; fl2p . . . — ? flp>» i 

n n n \ 

the constant being evaluated by the consideration that ^(0) = 1. 

Example 28.1 

Let us apply these results to an examination of the moments of the distribution of 
covariance in the bivariate case. We have 


^11 ^aa _ _L_^, ^la = P 

1 — p* 1 — p* 

Wc then find for the c. f. of an, a^, a^z — 

1 — p* n 1 — p* n 

- p _ ^ J 

1 — p® n 1 — p* n 

We are interested only in the parameter 0^^ which we will write as 6, putting the others 
equal to zero. We then find-^ 

, r 1 r -P 


-*(n-l) 


j 1 _ _ ( 1 -p^)0* j 


Taking logarithms and evaluating coefficients of powers of 0, we find for the cumulants 

n - 1 


+P*) 


2 (« - 1 ) 


p(3+p*) 


''* = T-- (1 + fip* + P*). 
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In standard measure the distribution tends to normality as n tends to infinity. But for 
finite n we have 


i8i = 


4 pM3+p^)* 


n — I 


== 3 + 


(1 + P^)® 

6 1 + 6p® + p^ 


-1 (1+P*)* 

Thus, even when p = 0 our distribution^ though symmetrical, is not normal. 
Wishart (1928) has given formulae as far as those of the fourth order for eight or 
fewer variates. 


Hotelling's Distribution 


28.11. In the univariate case we can test the significance of a mean by comparing 
it with the estimated standard deviation, the ratio being distributed in “ Student’s ” form 
(or some simple transformation of it if we compare the mean with the actual sample variance 
and not the unbiassed estimator). We proceed to generalise this result. 

We require a single quantity which will serve as a measure of departure of all the means 
from the population values which, as usual, wc take to be zero. In place of the matrix 
of dispersions, we shall consider the matrix of sums of squares and products (6^-^) where 

n 

bjj = ^7 .... (28.30) 

As usual we take (b^^) to be the matrix inverse to (b^j). Let us now write 

T^=^n(n-l) W (28.31) 

This is Hotelling’s generalisation of the “ Student ” ratio t. 

In the simplest case when = 1 we have 

6ii == ns^ 


and hence 



y2 


n 


. (28.32) 


so that T becomes equal to the ratio t as required. 


28.12. We have 

(28.33) 

Let us now denote by Wy the sum of squares or products about the origin, so that 

»ny — + nxiXj (28.34) 

The determinant of wiy may be written 


1 

Xi^/n 

x^y/n . . 

Xp\/n 

0 

6,1 + «itf 

5^2 “f” nx-^x^ • . 

bip + nXiXp 

0 

6,, + 

6,8 +nxl 

bgp + nxtXp 

0 

bip + nXpXj, 

b.2p + . . 

bpp + nx^ 
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On subtracting Xi y/n times the first row from the second, and so on, we find — 


I TOy I = : 1 

i — Xl^/n 


Xxy/n . . . Xp's/n \ 
bx\ • . • bip I 


! ^ ^\p • • • bpp 

and on expanding according to the border row and column, 


I I = I I + | 6y | (28.36) 

It follows that 

rpVL 

I I n ~i ^ I “ I I 


or 


1 _J_^I 

|m,/r 


(28.36) 


This is a fundamental equation in the sampling theory of T and we proceed to interpret 
it geometrically. 


28.13. In the case p = 1 we have a single sample space of n dimensions. The numera- 
tor and denominator of (28.36) then reduce to b^x and ntxx — that is to say, the squares of 
distances from the sample-point Pj to its projection on the unit vector whose direction 
Cosines are all equal, and from Px to the origin, respectively. The ratio of (23.36) has 
zero dimensions and is in fact the square of the sine of the angle between OPx and the unit 
vector. This is the geometrical approach which gave us “ Student’s ” distribi^on in 
Example 10.6 (vol. I, p. 239). 

In the general case let us regard the p 7i-spaces as superposed in one 7^-space. The 
points Pi . . . Pp will lie in a space of p — 1 dimensions, a hyperplane in the n-space. 
Now we may rotate the axis without altering the functions | | or | b^j I which are easily 

seen to be invariant under orthogonal variate-transformations. If we perform such a 
rotation so as to bring the {p —■ l)-space of sample-points into correspondence with p — \ 
co-ordinate dimensions, we see from (28.20) that | | is the square of the content of a 

hyperparallelopiped with one corner at the origin and sides parallel to OPx • • • OPp, 
Now consider a hyperplane perpendicular to the unit vector meeting it, say, in O', 
and let Pj . . . P^ be the projections of the points P on to this hyperplane. Then b^j 
is the covariance of the co-ordinates P^ and Pj referred to O', and hence | 6,.^ | is the square 
of the content of the hyperparallelopiped in the hyporplane. Furthermore, the content 
of this figure bears to that given by | 1 a ratio equal to the cosine of the angle between 

the unit vector and the hyperplane. Representing this angle by 0, we have 

L_-^. =co8*e (28.37) 


t 28.14. Now if the sample-points P are distributed in the n-space with random 
entation, the hyperplane which they determine will be distributed randomly in regard 
to tlie angle which it makes with a fixed vector, and in particular with the unit vector. 
The sampling distribution of 0 is then that of an angle between a fixed vector and a random 
plane. But this, from a slightly different viewpoint, is precisely the problem of distribution 
which we solved in connection with the multiple correlation coefficient P, for we saw (15.18, 
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vol. I, p. 381) that R is the^ine of the angle between a residual vector represented'TSy-,. 
variate space containing other variates \ and in the case when^ 

the former is independent of the latter wc can regard it as fixed. Thus, from (28.37) we 
may write — 

^ = 1 - (28.38) 


1 + 




n — \ 

The distribution of in the case when the variate concerned is independent of the 
others is 


dF = 




(1 jB2)*(n-p-2) (B2)H2>-3) 


(28.39) 


where we must remember that p is the total number of variates and the variates ate measured 
from their means in forming the regression equation. Before substituting (28.38) in this 
expression we must increase p by unity, since in effect we are considering p + 1 variates 
— the unit vector determining an additional one ; and we must also increase n by unity 
because our variation is not restricted to that about the mean, as for multiple correlation. 
With these alterations in (28.39), we have, on substituting for R from (28.38) and a little 
reduction, 

This is the distribution of Hotelling’s generalisation of ‘‘ Student’s ” ratio. 


(28.40) 


28.15. At the end of the chapter we shall see that this is a particular case of a more 
general distribution (28.31). A third and instructive derivation, due to Wilks, is as 
follows : — 

From the manner of derivation of Wishart’s distribution it will be clear that if we 
substitute the moments about the origin a]^ for those about the mean the distribution 
is the same, except that there is an extra degree of freedom. The distribution is then 

'n'P \ A 


dF = 


I I*'-’’-*’ 

1) 


exp 






nda'. 


n 


Putting — -x A*>, we find, on integration, 

A 


jlilUP-i 


-1) n p(^ "t 

I I a' [«»•-»-» exp ( - a'if) 11 da' ^ ^ ^ / 


(28.41) 


Now replace n by » + 2r in this expression and divide by the term on the right in (28.4y. 
The result is to give us the rth moment of | o' | as 

/^r(i«'l) = ,4ir^ ^ 

\ k»\ pi 


+ r 






(28.42) 


A.S. — VOL. II. 
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We may also write the distribution of in the form given by our original derivation of 
Wishart’s distribution : — 

dF = exp ( - n da x exp ( - J5« x^) H dx. 

j,ip(p-i) nrr^-^\ 

Multiply this by | a' integrate, and use (28.42), transferring constant terms to the right 
as in (28.41) ; then replace nhy n + 28 and divide by the constant terms as they were 
before substitution. We find — 

, r(“ + ‘-.T‘ + r + .)r(!‘^ + .) 
n r T-vV^tV— 

I -or**-! r/“ + ^ 




Now put r = — s and note that 


a' I I »» I" 


We find 






I) 


Now the function on the right is the sth moment of 

ItH 1 .1/m M 0\ /V 


dF = 


,f n-p p\ 

\ 2 '2) 


*»(»-»-«) (1 _ a;)»(p-2) dx 


(28.44) 


(28.46) 


which is uniquely determined by its moments. This, then, is the distribution of the ratio 

and on substitution in terms of T from (28.36) brings us back to the distribution of 

(28.40). Incidentally this method gives us one more derivation of the distribution of 
multiple correlations and correlation ratios when the respective variates are independent. 


Significance of a Set of Means 

» .K 

28.16. Suppose that we have a set of 1; samples with numbers »i . . . each 
from a p-variate population. Let us also suppose that the populations have the same 
dispersion matrix but difFerent means, that of the jth variate in the 2th sample being pf 
proceed to derive a criterion for testing the means simultaneously. Our result is a 
mineralisation of the testing of k means in normal samples, and we shall obtain it by applying 
the same method, namely by using the likelihood criterion 

^ — yo(<» max.) 

~ Pi (Q max.) 

as given in equation (26.64). Here o is the domain for which all the means of the Jth 
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variate have a common value and Q that for which they have the more general values 


hm- 

Let bij be the function 6^ for the ith sample (1 = 1,2,.. 
of the ith variate in that sample. Put 

. k) and X{ ff, the mean 

k 

= El W 

. (28.46) 

where, of course, 


ni 

il) ^ ^ (^ii (1) ““ (1)) (^Ji (D ~ (/))• • 

i’^l 

. (28.47) 

Put, for the functions of the pooled samples, 


1 « 1 ^ - 

— - 2 x^f fj) ^ --Lui x^ . 

w /, / w / 

. (28.48) 

If then 

(/) ~ ^ (^it il) ■“ (/)) ID “** h (/)) 

. (28.49) 

. (28.60) 

the likelihood of all samples together is 

c 1 ^ 1*^ exp {- \ E {ni ^^j) }, . 

. (28.51) 


where c is a constant. 

Taking logarithms and differentiating, we have for the maximum value equations 
typified by 

E Eui — /q (/)) + — /iy (ij) } = 0, 

i t 

which reduce to 

N it)- • • • • • • (28.52) 

The maximum likelihood values of the m’s are then given by 


(0 (ly 

Furthermore, the values of are then given by the inverse of the matrix ^ ~ 6^^^, and the 
exponent of (28.51) becomes 

— 27 (A*^ (/)) = — ^nk. .... (28.53) 



. (28.56) 
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Hence 


and we may write 


Ir 


n 


1 , 

in 

n 



2 % 
L = An = ^ 




. (28.56) 


and take L as our criterion. 


28.17. The distribution of L for general k is not easily expressible, but we may 
determine its moments by the method employed in 28.15. The functions -6^^ are dis- 
tributed in Wishart’s form and their moments accordingly given by equations of the type 
(28.42) with n replaced by n — 1, namely, 


tiPf p 




. (28.57) 


Now each is distributed in Wishart’s form, and therefore their sum is so distributed 
(cf. Exercise 28.3). In the manner of 28.15 — we omit the details — ^it is found that 


h A_ 


p/n —m\p/n 

fr \ 2 ) \ 


m + 1 — i 


.)r(» 


— m + I — 

~2 , 


. (28.58) 


where we now use m as an index of summation, reserving k for the number of samples. 
This gives us the moments of L. 

In the case i — 2 we have 




and hence the distribution of L is in the form 

•a-n 1 - 




i»(n-l>-3) (1 _ L)Hp-i) dL. 


(28.69) 


(28.60) 


In the case k = 3 we find 




p - 2 ^ ' 

' n — p— 2' 
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which, in virtue of the relation 

r(x + \)r(x + 1 ) = 

becomes 

u 2) r(n - p - 2 + 2r) 

JT (n - 2 + 2r) r{n - p - 2) ' 

These are the moments of the distribution 


dF 


2B (n 


1 

p 


2 ~ ^ (VLr^^-^ (1 - dL, 


a rather unusual form. The results are due to Wilks. 


. (28.61) 


. (28.62) 


28 . 18 . The line of generalisation of univariate analysis will now probably be clear. 
Corresponding to most of our results for a single variate there will be a generalised result 
for p variates ; and, in fact, if we like to regard the jo-variate as a vector we can often draw 
direct analogies between results for vectors and those for the (univariate) scalar. It is 
of special interest to observe that the role played by the variance in univariate theory is 
taken over by the determinant of the dispersion matrix in multivariate theory. 

Up to this point we have generalised the distribution of variance (the ;f2-distribution) 
into Wishart’s form, and the ^distribution into Hotelling’s form. 

Other results which suggest themselves for generalisation are regression and variance 
analysis. But in a sense our treatment of regressions is already general, for we have dis- 
cussed the regression of one variate on — 1 others. Below we shall go further and 
examine the relations between p dependent and q independent variates. In vector lan- 
guage, we consider the regression of a p-way vector ^ on a q-wa,y vector x. We have also 
considered the analysis of variance for the bivariate and trivariate case in Chapter 24 
under the title of analysis of covariance, and since the interest lies mainly in the direction 
of regressions we shall not take the subject further here, though it is capable of develop- 
ment and even, perhaps, of application if data become available in sufficient abundance. 
In the remainder of the chapter we shall, in the first instance, deal with an offshoot of 
regression theory which has some interesting taxonomic applications, namely discrimina- 
tory analysis ; and we shall then proceed to the general problem of the relationship between 
two sets of variates. 


Discriminatory Analysis 

28 . 19 . Suppose we have p observations for each of 2n sample members, and that 
each member can have emanated from one of two populations, n to each population. We 
require to find some measurement depending on the p observations which will enable us 
to assign subsequently drawn members correctly to their parent populations with the 
greatest assurance of success. For this purpose we shall find p quantities and 

a discriminant function X related linearly to the variates by 

X=X^Xj (28.63) 

The criterion on which we shall rely is that the A’s must be chosen to maximise the ratio 
of the difference between sample means to the standard deviation within the two classes. 

Any linear function of type (28.63) has variance S, given by 


. (28.64) 
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where, as usual, is the covariance of Xi and Xj which we assume to be the same for both 
populations. Further, if the difference of the two means of Xj is dj, the difference of the 
function X for the two samples is 

D (28.65) 

We have then to maximise for variation in A the function 


D* _ (X^d^y 
8 

This gives for each A 

2 aA £> aA ’ 

leading to equations typified by 

A^ d^. 


. (28.66) 


. (28.67) 


Multiplying by and summing over i, we have 

A^ == g 


or, replacing k by j, 


= A^aj^ = A*^; 
A# = ^ o«. 


(28.68) 


This determines the A’s, except for the constant which can be chosen at will so far as the 

discriminant function is concerned. If c is some constant, we have 

A^ - c a^K (28.69) 

The result also holds if there are members in the first sample and n^ in the second. 
Equation (28.66) remains true, and the rest of the analysis is the same as for equal class- 
numbers. 


Example 28.2 (from R. A. Fisher, 1936a). 

Measurements were made on fifty specimens of flowers from each of two species of 
iris, setosa and veraicolovy found growing in the same colony. Four measurements were 
taken, viz. sepal length, sepal width, petal length, and petal width. We denote them by 
^ 2 * ^3 And Xt respectively. 

The means of the specimens were (in centimetres) : — 


Variate. 


Versicolor. 


^8 


6*936 

2*770 

4*260 

1*326 



Setosa. 


6*006 

3428 

1*462 

0*246 


Difference 

(V-S). 


0*930 
- 0*668 
2*798 
1*080 
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The sums of squares and products about the 'means were (in cm.®) : — 







a?, 

191434 

9 0366 

9-7634 

3 2394 

a:. 

90366 

11-8668 

4-6232 

2-4746 


9-7634 

4-6232 

12-2978 

3-8794 

«4 

3-2394 

2-4746 

3-8794 

2-4604 


The inverse matrix is, in cm."^ : — 



^1 

^2 

0:3 

^4 

Xx 

0-118,7161 

- 0-066,8666 

- 0-081,6168 

0^039,6360 

^2 

- 0-066,8666 

1 0-146,2736 

0-033,4101 

- 0-110,7629 

*3 

- 0-081,6168 

I 0-033,4101 

0-219,3614 

- 0-272,0206 

^4 

0-039,6350 

i 

- 0-110,7629 

1 

1 - 0-272,0206 

1 

0-894,5606 


We need not bother to divide these quantities by n because there is an arbitrary con- 
stant in our discriminant function which absorbs it. The matrices are diagonally sym- 
metric, and it is not always necessary to write out the values below the diagonal as we 
have done here. 

From (28.69), with c = 1, we then find — 

- 0 031, 1611 A® = - 0 183, 9076 

A® =- 0-222,1044 A« = 0-314,7370. 

If we choose the coefficient of Xx to be unity the discriminant function is then 

X + 5-9037a;a - 7-1299a;3 lO-lOSOa;*. . . . (28.70) 

The mean of X for versicolor, obtained by\substituting the means of the x's for that species, 
is found to be — 21-4816, and that for setosa is 12-3345. The difference is thus 33-816 cm. 
Let us compare this with its standard error to see whether it is significant of real differences 
in the values of X for the two species. 

From the matrix of sums of squares and products we find 

NvarX ^ A^ A^ = 1085-5522, 

where the A’s are, of course, the coefficients in (28.70). N here is the number of degrees 
of freedom of the estimate of the variance. There are 100 members altogether, with 99 
degrees of freedom, but we have eliminated four corresponding to the means of the four 
variates. We therefore take jV to be 99 — 4 = 95, and find 

varX = 11-4269. 

This is the variance of a single value. That of the difference of the two means of 60 values 
is obtained by division by 26 and is thus 0-4671, the corresponding standard error being 
0-676. 

The observed difference of means, viz, 33-816, is about 60 times this amount, and 
there is thus a real difference in the values of X for the two species. In other words the 
discriminant function is a good one. It is best among the linear functions of the x*& because 
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we have chosen it so that the difference of two values, divided by their estimated standard 
error, shall be the greatest possible. To use the function we should, given a flower of 
doubtful species, calculate X for it and assign it to one species or the other according as 
X were nearer to the mean value of X for one species or the other. If, of course, 
the observed value differed from the mean values by more than twice the standard error 
of each, we should begin to doubt whether it belonged to either. 

The analysis may be put in rather a different way. Suppose we analyse the variation 
of X between and within species. The sum of squares between species in the 50 x 2 
classification is 

60 { - ^)* + - -?)* }, 

where it are the respective means and X the mean of the whole. This reduces to 25D*. 
The sum of squares within classes is 1085-55 with 95 d.f., as found above, and we have — 


Sum of Squares. 


d.f. 


Between species 
Within species 


28 , 588*05 

1 , 085*55 


4 

95 


Totals 


29 , 673*60 99 


Our method of selecting the discriminant function has been such as to minimise the sum 
of squares within species and, for constant total, to maximise the sum between species, 
and hence to minimise the ratio of the latter to the former. For the moment we cannot 
assume that this ratio may be tested in the 2 -distribution in the usual way, though we shall 
see presently that this is so. 


28 •20. The relationship of discriminatory analysis for two classes and the theory of 
regression may be brought out by introducing a formal variate y for the classes. If there 
are Ui members in one class and in the other we shall assign the values 


n% — Til 

ni + na’ Ux + n* 


to the y-variate for the two classes respectively. The mean of y for the whole sample is 
then zero and the sum of squares is 


Ux n% 

Ux + na 


f , say. 


Considering now 


Y 


'i 


as a regression equation, we find for the coeSicients A 


. (28.71) 
. (28.72) 


Z ( Yx^) — E = 0 , 

27 ( 7 ^^) - = 0 . 




+ »i 


or 

Now 


(*#)» 


, (28.73) 
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where the suffixes of the i-’s relate to the* first and second classes, 

- Wj- 

Thus :dj=k*aij, (28.74) 

which is another way of writing (28.69) with a particular value for the constant c. 


28.21. Pursuing the analogy with regression analysis further, we see that since 

and 2:(Yx^)=^dj 

we may analyse the sums of squares as — 

Sums of squares. d.f. 

p 

n^ + n^—p — I 


C m + ^2 - 1 ... (28.75) 

as for a regression line. If R is the multiple regression of Y on the a:-variates, 

= }}d^ (28.76) 

In ordinary regression analysis we may test the ratio R^/(\ — 2?*), multiplied by 
suitable constants, in the 2 ;>distribution ; but this depends on the assumption that the 
dependent variate y is normal for any fixed x'b. Here we have the case when the dependent 
variate is fixed but the a;’s are normal. The test still holds in such a case, the reason being 
the kind of duality we noted in 28.14 in arriving at Hotelling’s distribution. The distri- 
bution of angles between a fixed plane and a random vector is the same as that between 
a fixed vector and a random plane. Consequently the table of (28.75) can be regarded 
as an analysis of variance and the z-test applied. 

28.22. We may extend the discriminant function to the case when the property to 
be discriminated is not, as above, a matter of allocation to one of two classes, but to several 
which may in particular be determined by certain values of a continuous variate. If we 
have various measurements of p a;-variates corresponding to values of a y-variate, we may 
form the regression of y on the x’s and use the resulting function as a discriminator. As 
in the case of dichotomy, the regression will maximise the difference between classes as 
compared with intra-class variation ; and its significance may be tested in much the 
same way. 

Example 28.3 (from M. M. Barnard, 1935). 

An investigation was undertaken into the changes taking place over time of the char- 
acteristics of certain Egyptian skulls. There were four sets of skulls, known to be from 
Late Predynastic, Sixth to Twelfth, Twelfth to Thirteenth and Ptolemaic dynasties respect- 
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ively, and the relative time-intervals were taken to be in the proportions 2 ; 1 ; 2, so that 
the values. of t for the four periods may be taken to be respectively — 6, — 1, -f- 1, -H 6. 
For the skulls four measurements were selected : 

Xi, basi-alveolar length ; 

Xt, nasal height ; 

X,, maximum breadth ; 

Xt, basi-bregmatic height. 

It is required to find a fimction 

X = X*- Xi + X* Xt + A* Xt -f- A* a:* 

which will best discriminate between skulls belonging to different periods. 

The means of the series were as follows, the sample numbers also being shown : — 


Variate. 

Series I 

Series II 

Series III 

Series IV 

(n, = 91). 

(n, = 162). 

(w, = 70). 

(nt - 75). 


133-582,418 

134-265,432 

134-371,429 

135-306,667 


98-307,692 

96-462,963 

95-867,143 

95-040,000 


1 50-836,165 

51-148,148 

60-100,000 

52-093,333 

131-466,667 

i 

^4 

! 133-000,000 

1 

134-882,716 

133-642,857 


The sums of squares and products about the means are — 





*1 


*1 

9661-997,470 

445-573,301 

1130-623,900 

2148-584,219 


... i 

9073-116,027 

1239-221,990 

2255-812,722 


... 

... 

3938-320,351 

1271-064,662 

*4 


. . . 


8741-608,829 


The mean value of I, for the 398 observations is — 0*432, 161, and the values of t —i 
for the four series are accordingly 

- 4-667,839 ; — 0-667,839 ; 1-432,161 ; 6-432,161. 

The sums I!Xj(t — i) are respectively 


Xt 718-762,86 

Xt - 1407-260,75 
Xt 410-101,94 

Xt — 733-668,32 


and finally, X(t — <)* = 4307-668,32. 

We could obtain the coefficients A from the reciprocal of the matrix above on the lines 
of the previous example. It is also instructive to observe, firom the analogy with regres- 
sions, that instead of that matrix we may use the matrix (depending on one extra degree 
of fireedom, 395 in all) obtained by adding to the sums of squares the regressions on time. 
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For instance, instead of 9661-997,470 we have 9661-997,470 + (718-762, 86)*/4307-668, 32. 
The resulting matrix is 


.^5 

•**4 




^2 

a?4 

9781-927,828 

210-762,489 

1199-052,135 

2026-206,952 

. . . 

9532-849,476 

1 1105-246,827 

2405-414,318 

. . . 

... 

' 3977-363,203 

1201-230,304 




1 8866-382,928 

i 


The reciprocal of this is (units = 10 ®) — 







.^2 

.r4 


110-368,975 


6-938,481 

116-693,529 


- 28-145.236 ' 

- 24-948,984 j 
273-988,409 | 


- 23-361,935 

- 30-767,069 

- 23-666,691 
129-990,069 


The resulting values of A are 

== 0-075,156,739, = -- 0-145,490,050, 

A3 -- 0-144,600,884, A^ = -- 0-078,538,419 

and these, or constant multiples of them, give us the constants in the discriminant function 
which will best enable us to assign a skull to the correct period by measurements of the 
four specified variates. 

In this analysis we have 398 members, but of the 397 d.f. we have discarded two with 
the general mean. The d.f. of the sum 4307-6683 — Z {t — t)^ are 395, of which four are 
attributable to regressions on the other variates. For the contribution of these four we 
have 

Ai X 718-762,86 ^ etc. = 375-6657. 

The analysis of variance is thus — 


Sum of Sqmircs. 


d.f. 


I 

Quotient. | 


I Regression 
! Remainder 


Totals . 


375-6657 

4 


3932-0026 

391 

10 0563 

4307-6683 ' 

1 

395 



The analogy of the discriminant function with regressions noted above may be used 
to provide standard errors of the coefficients A. In our present case the variance of A* 
is obtained by multiplying the remainder quotient, viz. 10-0663, by the term corresponding 
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to xf in the reciprocal matrix of sums of squares of the ^’s, namely 110*368,975 x 10'~*. 
This gives a standard error of 0*0333. We obtain finally 

= 0*0752 ± 0*0333 

A* === - 0*1455 ± 0*0341 
= 0*1446 ± 0*0525 

- 0*0785 ± 0*0362. 

All coefficients exceed twice their standard error, and hence all the variates are useful in 
discriminating between skulls of different periods. 

I am indebted to Dr. M. S. Bartlett for the calculations of this example. His results 
differ from those reached by Miss Barnard in her original investigation since she took an 
unweighted regression of the variates with time, whereas he has weighted the values 
according to sample numbers. He also notes that the significance of the results has been 
tested above on the basis of variability within classes, but that a fuller analysis of the means, 
bringing back the two degrees of freedom discarded, reveals further differences between the 
series. Thus, though the discriminant function will efficiently sort the series examined in 
relation to their periods, we must be cautious about associating the observed differences 
with the time-changes. 


Canonical Correlaiions 


28 . 23 . We now turn to consider the general theory of the relations between two 
sets of variates Xi ... Xp and Xp^i . . . Xpj^^, where we suppose that p <q. Following 
Hotelling (19366), we shall show that in general there can be found linear transformations 
to variates fi . . . Sp, Sp+i . . . Sp+q such that 

(а) all the |’s have unit variance and zero mean ; 

(б) any | in the p-group is independent of the other {’s in that group ; 

(c) any f in the 9-group is independent of the other f’s in that group ; 

(d) the correlation between any f in the p-group and any f in the 9-group is zero except 

for p correlations pi ... pp, which may be taken to be the correlations between 
Si and Sp^ij Si and Sp^ 2 f * * * f2p* 

The variates f are then said to be canonical variates and the p’s canonical correlations. 

This part of our work is, fundamentally, the reduction of two quadratic forms and an 
associated bilinear form to canonical types and does not depend on the distribution laws 
of the variates. Furthermore, the reduction can be carried out either on the population 
or on the sample. In the latter case it will yield sample canonical correlations which may 
be written and regarded as sample-values of the parent p’s. 

We will suppose that our variates x have zero means and dispersions denoted by 
where, for the time being, we use a to denote a variance or covariance instead of the more 
usual a^. Those dispersions in the ^-group we denote by Greek affixes : and those 

in the 9-group by Roman affixes : a^. For a covariance of a jp-variate with a 9-variate 
we write one Greek and one Roman affix : 

Consider now a particular pair of variates given by 


f = c*x„ a = 1, 

=d^x„, o = 1, 

If their variances are unity we have 

= i] 
= i;* 



. (28.77) 

. (28.78) 
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We will also impose the condition that their correlation R is stationary for variations in 
the coefficients e and d, i.e. that 


R =6* d<^ <r„ 


stationary. 


(28.79) 


Equations (28.78) and (28.79) then require an unconditioned stationary value of 

c« - iA c« df> . . . (28.80) 

where A and fi are undetermined multipliers. This leads to 


d" a. 


— d" = 0\ 

, — Ac^or^ ==0J’ 


(28.81) 


Multiplying the first equation by and summing and the second by c* and summing, 
we have, in virtue of (28.78) and (28.79), 


i? = A - // (28.82) 

Equations (28.81) will then be soluble for the p -t q unknowns c and d if the determinant 
of their array vanishes, that is if, writing A for the constants p and A, 


— AfTii 

. . . 

— A<r,p 


^l,p+q 

• 

I 

. 

— ^pp 

^P.P+l 


1 

. 


— A<Tp^l^^,4 1 

. . ; ^^p+l,p+<? 

^p+9, 1 

. 

P 

2>+l 

• • • ^^p+a. P+7 


. (28.83) 

an equation determining A. Before studying it further we will throw the equation into 
a somewhat different form. 


28.24. We may write (28.83) as 






- 



. (28.84) 


Multiplying the first p rows by — A and dividing the last q columns by — A we find the 
equivalent form 


(- A)«’^ 


■ : 





. (28.85) 


Writing, in conformity with our usual notation, (a*0 for the matrix inverse to (ay) and 
remembering that 

Offc = 4. 


let us multiply (28.85) on the left by 
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The product of determinants is then 


A* — dip Oy^ 

6 y - O* Oyfc CfJ 

Otg ; 



'■ 

0 

o'* 



which gives 




’fiy 


Oyk 


= 0 , 


a determinant with p rows and columns multiplied by a power of L 


. (28.87) 


28 . 25 . Returning now to our original problem, we see that if a simple root of (28.83) 
is substituted in (28.81) the c’s and d’s are determinate, except of course that they may be 
replaced by — c and — d. For a root of multiplicity m they are determinate except for 
m — 1 assignable constants, a result we take without proof from the theory of algebraic 
forms (reference may be made to Hotelling’s paper for details). 

From (28.87) we see that the equation in A has p + q roots. It cannot have fewer, 
for the coefficient of the highest power of A in (28.83) is the product of two principal minors 
which do not vanish unless the variates are linearly dependent, a case which we exclude 
from the discussion. Of these p + q roots q -- p are zero. The remaining 2p can be 
grouped in pairs, each of which is the negative of the other. There are thus roots which • 
we may write ± pi, . . . ifc Pp. We choose as the roots those which are not negative and 
proceed to prove that they are the canonical correlations as we have defined them. That 
they are, in fact, correlations follows from (28.82). 

Suppose we have a root py and determine the corresponding constants Cy and dy and 
hence a pair of variates and tjy. Then we have, from (28.81), 


^aa Py 

~ Py ^oiP) 


(28.88) 


Similar equations obtain for a second pair, say and Between these four variates 
there are six correlations, two of which are py and p^. We wish to show that the other 
four vanish. They are 


E iSy Si) c§<r^ E (T]y fit) = d“ dj 

• E {Sy r)i) = cj dg E (Sg ‘ijy) — c§ d“ . 

Multiply the first of (28.88) by dg and sum. Using (28.89), we have 

E (Sy V») =PyE (Vy rig). . 

Similarly from the second of (28.88) multiplied by 

E (Sg t)y) = PyE (Sy Sg). 

Interchanging y and d we find from (28.90) and (28.91) 

Py E (rjy rit) ==pgE (Sy Sg)- 

Equally, again interchapging y and d in (28.92) we have 

Pd E (riy rig) = PyE (Sy Sg). 


. (28.89) 
. (28.90) 
. (28.91) 
. (28.92) 
. (28.93) 
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Thus, unless ~ 

E{Sy£,)=^E{r]^tj,)^ 0 (28.94) 

It follows from (28.90) and (28.91) that the other correlations also vanish. 

We have only to round off the proof by showing that if p is a root of multiplicity m 
the property still holds. This follows from the consideration that we may then choose 
our c’s and d’s to obey certain orthogonal conditions ensuring that 

E (f, I,) + E rj,) 0. . . . . (28.96) 

It will then follow from (28.92) that each expectation vanishes unless = 0 ; and 

even in this case, (28.91) and (28.92) show that two expectations vanish, and we may then 
choose our assignable constants so that the others vanish. 


28 . 26 . When the variates are put into canonical form the dispersion matrix reduces to 


1 

0 . 

. 0 

pi 

0 . 

. 0 . 

. 0 

0 

1 

. 0 

0 

P2 . 

. 0 . 

. 0 

0 

0 . 

. 1 

0 

0 . 

- Pp - 

. 0 

pi 

0 . 

. 0 

1 

0 . 

. 0 . 

. 0 

0 

P2 . 

. 0 

0 

1 . 

. 0 . 

. 0 

0 

0 . 

• Pp 

0 

0 . 

. 1 ‘ . 

. 0 

0 

0 . 

. 0 

0 

0 . 

. 0 . 

. 1 


\ 

/ 


(28.96) 


with a determinant equal to 


(I -p?)(l - pi) . . . {1-pI). 


Example 28.4 (from Hotelling, 19366, dealing with data of T. L. Kelley). 

140 seventh-grade school children were given four tests in (a) reading speed, (6) reading 
power, (c) arithmetic speed, and (d) arithmetic power. It is required to find canonical 
variates for the two reading tests and the two arithmetic tests. 

The correlations between the variates were— - 



•**1 

1 1 


^A 


lUOOO 

i 

0-6328 1 

0-2412 

0-0686 


0-6328 

1-0000 1 

- 0-0563 

0-0666 

J*3 

0-2412 

i ~ 0-0663 I 

1-0000 

1 0-4248 


0-0686 

j 0-0655 ! 

0-4248 

j 1-0000 


The determinant (28.83) becomes 

- A - 0-6328A 

- 0-6328A - A 

0-2412 - 0-0663 

0-0686 0-0665 


0-2412 0-0686 

- 0-0663 0-0666 j _ 

-A - 0-4248A j “ 

- 0-4248A - A I 
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or 

giving 

with 


0-491,370 A* - 0-078,803,4 A* + 0-000,362,490 = 0, 
A* = 0-166,635 or 0-004,740 
A = 0-3946 or 0-0688. 


To find the transformed variates themselves we use (28.81). For instance, with the 
root 0-3946 for n, we have 

+ 0-6328 c* - 0-6114 - 0-1486 d* = 0 

0-6328 cl + c* + 0-1402 d^ ~ 0-1660 d* = 0 

- 0-6114 cl + 0-1402 c* + di + 0-4248 d* = 0 

- 0-1486 cl - 0-1660 c* + 0-4248 di + d* = 0 

The last equation is linearly dependent on the other three, so adds nothing. In the other 
three we solve for the ratios of c’s and d’s, finding 

cl : c* : di -. d* == - 2-7772 : 2-2666 : - 2-4404 : 1. 

Thus the transformed variates are 

fcj = - 2-7772 JCj + 2-2666 
kt Tj^ = — 2-4404 Xf -f- Xt, 

where and Jb^may beohosen so that the variances of fi and r/' are unity,if desired. Similar 
equations with the root 0-0688 will give us a further pair of canonical co-ordinates. Those 
we have worked out have the maximum correlation, the other pair having the minimum 
and therefore being of less interest. 


28.27. In practical cases it is of some importance to know whether an observed 
canonical correlation Ti, say, is significant of real correlation. The problem has been solved 
for large samples but not completely for small samples. We shall conclude this chapter 
with a short account of the main results which have been reached. 

.For large samples we shall show that, for the standard error of a canonical correlation, 

varr = -(1 -r*)* (28.97) 

n 

a remarkable result showing that the variance is the same as for a product-moment 
coefficient. 

Denoting as usual the sample covariance by we have to the first order 

^ {^ij) “ •••••• (28.98) 

To the same order, 

E (Oy Ok) = ~ e\e (x,^ Xj,) E {x ^ %) }. 

71 K (X ft J 

IfoLT^ P the sums on the right are independent, and there are n (n — 1) such cases. When 
a = /? we have n terms such as 

(®tx **« *(«) = - - • (28.99) 

as follows from the consideration that the chaiacteristio function of the multivariate normal 
form is . 


(of. 15.12, vol. I, p. 376). 


exp(-iory«<«^) 
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Hence we have 

^ u in — 1) n , 

Ml (Oy Uu) — ^ ®« + <T« <Tflc + CT(fc 0)i) 


— ay a*j + —(ay oy* + .ajji; ay). 
TV 


Thus 


E (daij da/a) = E (a^ a^y) — 

= “ (<^t/ ^jk + ^ik ^il)- 


Now for any canonical correlation r we have 

C* ^(xfi f 

r = c* d* « ; 


ay = lj 


(28.100) 


(28.101) 


(28.102) 


If now we define for the sampling deviations in c’s and d’s corresponding to deviations 
in the a’s. 


dc-=2:|!lda,„. 


. (28.103) 


(28.104) 


“}• 


we find 

2 c* + c* = 0 

2 d® Jd* + d® d* = 0 

= «a 5 + ®«6 + c* d* 

Without loss of generality we may now suppose the variates canonical and hence put 
cl = 1, c® = c* = . . . = cP = 0 , di = 1, d* = . . . = d« = 0 . We then find- 

2^ci 4" = 0, 2Ad^ + p+i ~ 

Avi = rx ddi + ri + Aa^^ 

Substituting from the first two in the third of these equations we have 

Afi = Aai^ p+i -* ir^ (Ja^ + p+i). 

Similar equations apply for any other simple root, e.g. 

Ar^ = p+2 \Tt (Aa^z + ^®p+2, p+2)* 

Squaring these equations and substituting from (28.101) we find 

nE (drx)a = (1 - r?)* 

E (drj, Atz) = 0. 

It follows that 


(28.105) 

(28.106) 


varri = - (1 -/>?)*( 

I 

cov (fi, r.) = 0 J 


(28.107) 


to our order of approximation. 


28.28. Equation (28.107) applies to a simple non-vanishing correlation. If a canon- 
ical correlation vanishes and p = q, the result holds, with the^ qualification that sample 
values of r near the zero root must be allowed to have positive or negative values, or alter- 
natively that the distribution of r is that of absolute values of a normal variate (cf. Exercise 
28.7). lip = 2,q> 2 a zero root is of multiplicity q at least. In this case, if it has exactly 

A.S. — ^VOL. n. A A 
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multiplicity nr^ is distributed as with q — I degrees of freedom. For the proof of 
this result see Hotelling (19366). 

There is another rather curious difficulty in testing the significance of roots of the 
equation giving the canonical correlations, namely, that if several roots exist it is not pos- 
sible to relate them with certainty to specified parent correlations — any one might have 
arisen from any one of the parent values. This is not serious for large samples when the 
roots are distinct, since the sample values cluster closely round the parent values ; but 
for small samples or canonical correlations in the parent which are close together it presents 
a theoretical problem of a novel kind. See Hotelling (19366) and Bartlett (1941) on 
this point. 


28.29. We proceed to find the sampling distribution of canonical correlations in the 
case when the parent values are all zero and the p-variates and 9 -variates accordingly 
independent. 

Reverting to equation (28.87) in the form appropriate to samples, we have 


We write 
and 

so that (28.108) becomes 


I "ft. - % I = 0- 

*fiy = ^ 

“a- =*/>)' + 

I (*A' I “ 


. (28.108) 

. (28.109) 
. (28.110) 

. (28.111) 


The significance of this device is that z and t are distributed independently in Wishart’s 
form, as we now proceed to show. 

One instructive way of looking at the problem is to consider the regression of the 
p-way vector y on a 9-way vector x. Corresponding to the univariate equation 


y = bx + e, (28.112) 

where e is a residual, we have 

(28.113) 

where the 6’s are given by minimising the sum of n values 

^ (y« - *<)* 

namely, by 

^ (y. ^i) (^k *<) = 0 

or, in our notation for canonical variates, 

«jw == 0, 

which yields 

6j=a^a« (28.114) 

We may analyse the variance of y in the form — 

= +-£■(«<.)• (28.116) 

oorrosiKHidling to the uniTariste case 

i: (»*)=&• 27 (**) +.r(e«), 

and tibie two constituents on the right in (28.116) are independent, just as in the nnivaxiate 
oase. This may be shown by a direct extension of 22.19. 
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Eurtheraioro, if w© wish to find the linear function of the y's, say A* which has 
ma^muni correlation with the x*a, we have to maximise the ratio 


_ A« XO 6^ oy _ 

This is equivalent to maximising unconditionally 


(28.116) 


A“ (6^ bl aij ■— r* a^) == 0, 

giving, for r*, the equation — 

~r*a^\ =0 (28.117) 

Now in virtue of (28.114) this reduces to 


I r* - oy a“* | = 0 

or 

I r* — Ojy I = 0, . . . . (28.118) 

which is equivalent to (28.108) with a slight change of notation. This must be so, for 
we arrived at both equations on essentially the same assumptions. Now we see that the 
term on the right in the determinant of (28.118) is the first item on the right of the variance 
analysis given by (28.116), and the other term in the determinant is the sum 27 (y^) of the 
analysis. It follows that z and t of (28.111) are independent, for they are the constituent 
items of the analysis. Furthermore, the will be distributed as sums of squares or pro- 
ducts about the means with n — q degrees of freedom, that is in Wishart’s form ; and 
similarly the ^’s are distributed as q sums of squares or products about the origin, i.e. in 
Wishart’s form with n = ? + 1. 


28.30. Without loss of generality we may take the parent variances to be unity ; 
the covariances are zero by hypothesis. The joint distribution of z and t is then, from 
(28.26), 


dF = 


I t |i I z I* exp I — I TJdf dz 

2\p (»+i) (p-i) ^ I r ^ J-H* ^ r ^ ^ j | 


(28.119) 


In the determinant 

I A* (z + 0 - < I = 0 


put u = X* and let the roots in u be arranged in descending order of magnitude. Consider 
the distribution for a given value of and Zy which in particular we take to be 6y. Let us 
choose new variates from a set obeying the orthogonality conditions — 

k^\ 

= 0 if * 7^3 

= 1 if t = j (28.120) 

Make the transformation *y = (fik fi* (28.121) 

k 

hj ^ Hik ^ik) ~ .... (28.122) 

k 
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Instead of the \p(p + 1) values of we will take the p values of u and \p{p — 1) of the 
I’s as our new variates. We have ^ 

M I = I ftfc iyjb % I = % • • • • • (28.123) 

Jb-l 

IH ^ (1 -%) • • • (28.124) 

fc-i 


and have only to consider the Jacobian. This is clearly of degree \p (;> — 1) in for the 
Jacobian of t and z + t is the same as that of t and z and only t contributes factors in u 
in the former. Furthermore, every term i < j is a factor of J. For consider 

Ux — Ut and let us take as our f-variates those for which j > i. Then to satisfy the con- 
ditions on the others, derivable from (28.120), 


we must have 


whence 


^ ^ iSik Sjk) = 

vSij k 

fii’ 9fi2 fii 

^-0, j>2, 

^ (Sik ijk «fc) 

Ofl* "fl* 

= . 

Cll 


(28.126) 


Thus eveiy term (u, — Uf) occurs in J, and there can be no further factors in u because 
the power in « is Jp (p — 1). 

Substituting in (28.119) we have, integrating out the f-variates, 

dF = c h {ttf <«-»-« (1 - «,)♦ (»-«-»-« } 77 («, - uj) ndu 

imml 

where 

The constant k arises from tenns involving n and p in the original density and from the 
Jacobian. It therefore does not involve q and may be written k {n, p). Evaluation of 
k by direct integration is a matter of some difficulty, but we may find it indirectly 
as follows : — 

In (28.126), if we increase q and n by 28, the corresponding value of c is 


. (28.126) 

. (28.127) 


fe(» + 2s, p) 



g + 1 + 2« — t 


)^( 



. (28.128) 


The only other term in (28.126) which is affected is that in IT (u) and, with the original 
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c of (28.127), the integral of the distribution so modified would give us the moment of 
oi^er a ot n (»), namely of | * |. This may be found in the manner of 28.15 to be 

■ ^ g + 1 +28— 1^ + 1 — 

'» + 2« + 1 — : 


n 


(see Exercise 28.11). It follows that 


-f- 2s — i 
2 


whence 


rf”’ 

k(n + 2a, p) ^ ^ \ 

* (». P) 


■). 


(28.129) 


k (n 


,p) = /7r(”^)/(p). . . . 


(28.130) 


(28.131) 


(28.132) 


It remains to evaluate / (p). To do so we make the substitution in (28.126) 

2vt 

= -T' 

n 

letting n tend to infinity. Our distribution becomes 

dF = / 1^) exp ( - r v^) n (v^ - Vj) n dv. . 

2- ) 

This may be reduced by successive substitutions of the type 

= Wu Vj =Wj + Vu j > 1, 

and choosing q at each stage so that the term in 77 (v) vanishes (as we may, since the result 
is independent of q). On integration for then repeating the process, and so on, we find 

f{p) 7Tr(p + l-i)^l 


nr 


Using the relation 
we have 




2kp(p-i) 


Fix) r(a: + i) = 2-**+V^ ■r(2ar). 


/(?>) = 


lr»l> 




Thus our distribution is finally 

dF ^cn {u^ (1 - u)^ (n-p-a-2) } 77 - Uj) Hdu, 

where 


= 7^ h 


... r r(? + ^ r(»-^-y 


(28.133) 


(28.134) 


(28.136) 


a remarkable form obtained in the general caee by Fisher (19396), P. L. Hsu (19396), and 
iEloy (19396). 
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We have supposed throughout that 9 > jp. In the contrary case we reverse the roles 
of q and p and hence merely have to interchange p and q in (28.134) and (28.135). 

28.31 • Let us consider some special cases. When g = 1 the distribution becomes 


dF = 




^ (P-2) (1 _ (n-p-3) 


(28.136) 


confirming the distribution of equation (28.40) leading to Hotelling’s distribution ; for 
the canonical correlation is then the multiple correlation between the g-variate and the 
p-variates ; and as the former is measured from its mean there is one fewer degree of 
freedom, i.e. n is replaced by n — 1 . 

When = 2 we have 






Writing 


tl fP"®) { (1 —Ml) (1 — ttj) 

X (Ml — M,)dMidMj. , (28.137) 


we finU 


(1 — Ml) (1 - M,) = V, 

Ml + Ml = W, 


dF = 


r(»-2) 


AT (n — p — 2) r (p — \) 

For given v the limits of w are 1 — v and 2 (1 — ■\/v), and integrating for w we find 
Fin -2) 2 


dF = 

or, for ^/v, 

dF = 


^ {n — p — 2) r (p — 1) ’ p — 1 
1 


_ 1 + W)i (P-3) til («-p-4) dv dw. . (28.138) 
B.nd integrating for w 
(1 - y/vy-^ {y/vy^-p-*dv 


B(n —p - 2, p) 
a result due to Wilks — of. equation (28.62). 


(1 - ^vy>-^ {■y/vy*-P-^ dy/v, 


. (28.139) 


28.32. The distribution of the m’s does not immediately provide a test of significance 
of the canonical correlations, except when there is only one of them. The criterion 

» = /7(1-m) (28.140) 

is sometimes useful in the general case for testing simultaneously the departure of the 
tt’s from zero. Cf. Exercises 28.11 and 28.12. 


NOTES AND REFERENCES 

Among earlier papers in which various aspects of the multivariate problem began to 
be studi^, reference may be made to Karl Pearson (19266) on the coefficient of racial 
likeness ” and Ragnar Frisch (1929), who independently arriired at the dispersion matrix 
and proposed to call its determinant in standard measure the “ scatterance Reference 
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to the papers by Wishart (1928), Wishart and Bartlett (1933c) and Hotelling (1931) on the 
generalised produch-moment distribution and the generalised “ Student ” ratio has been 
made in the text. 

In more recent literature three lines of development are discernible : — 

(а) American writers have developed the theory of canonical correlation and multiple 
analysis mainly on algebraic and analytical lines. See Hotelling (1933, 19366), Wilks 
(1932e, 1934, 19356, 1935c, 1936, 1943), Girshik (1939), and Madow (1938). 

(б) English schools have investigated the theory of discriminant functions and devel- 
oped the sampling theory of canonical roots. See R. A. Fisher (1936a, 6, 1938c, 19396, 
1940d), P. L. Hsu (1938c, 19396, 1941a, c, d), and for illustrative material Martin (1936), 
Barnard (1935), Fairfield Smith (1936) and Wallace and Travers (1938). See also Bartlett 
(19346, 1938c, 19396, c, 1941), E. S. Pearson and Wilks (19336), Welch (19396), Lawley 
(1938) and Bishop (1939). Simaika (1941) has proved that tests based on Hotelling’s T 
and the multiple correlation coefficient are uniformly most powerful in the class depending 
on a single parameter. 

(c) The Indian school, whose contribution has not been referred to in this chapter, 
has developed some interesting work based on what is known as the D*-statistic. See 
Mahalanobis (1930, 1936a), Mahalanobis, Bose and Roy (19366), R. C. Bose (1936a), R. C. 
Bose and Roy (1938c), and later papers in Sankhyd, If, with two samples from p- variate 
populations, is the difference of sample means for the ith variate, the studentised 
D*-statistic is 

D* 

P 

where refers to the reciprocal of the sample dispersion matrix. Bose and Roy have 
shown that in normal samples this has the same distribution as one of Fisher’s forms for 
the multiple correlation coefficient. The corresponding parameter for the population 

J 2 ^ dj 

P 

is known as Mahalanobis’s generalised distance. 


EXERCISES 

28.1. In a four-variate normal distribution show that the correlation between the 
covariances a^ and is 

piz Pi4 + Pi4 Pm 

{(i ‘+p!2)(i +PI4) P 

(Wishart, 1928.) 


28.2. For a pair of normal variates with correlation p, show" that, defining v by 


(Txa,(l ~p*)’ 

we have for the frequency function of v 

VS2..-. 
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for t> > 0 and a similar expression with — v for v inside curly brackets if v < 0. Here 
K is the Bessel function of second kind with imaginaiy argument. 

(Wishart and Bartlett, 1033c. See idso K. Pearson and others, 1029.) 


28.3. Show that if h sets ot variates a^, h = \ . k', are each 

distributed in Wishart’s form, with sample numbers . . . n^, then the variates 

k 

are also distributed in Wishart’s form with n — (%). (This follows readily from the 

■fci 

characteristic function. It is a generalisation of the additive properties of %*■) 


28.4. If a sample of n is chosen from a p-variate normal population, the variates 
being grouped into h classes a:, . . . \ . . .; aiH+ +i 

. . . Xp, consider the function — 


W 



wh«re = 1 and rf^ is zero if the variates belong to different classes and equals the cor- 
relation fij if they belong to the same class. 

By considering the function 

A = W‘» 

show that 


« Pi 

p;(W)= nn 




1 




u^-^4 


(Wilks, 19366. The distribution provides a test of the independence of k sets of normal variates.) 


28.5. As a particular case of the last exercise, show that if a single variate Xi is 
independent of a second set », . . . Xp, then — 



and hence find the distribution of the multiple correlation coefficient when the parent 
coefficient is zero. 


(Wilks, 19366.) 


28.6. Show algebraically that Hotdling’s T is invariant under linear tramfformationB 
of the p variates. 


28.7. If the determinantal equation (28.83) with p has a double root equal to 
zero, show that for large samples the value of r corresponding to tire canonical correlation 
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is given by omitting all'tenns in the determinant vrhen expanded, except those in A* and 
Noting that the latter is a perfect square, show that r is the ratio of a polynomial 
in the sample dispersions to a non-vanishing function regular in the neighbourhood of 
zero. Hence that (28.107) holds when p = 0. 

(Hotelling, 19366.) 


28.8. In the notation of 28.23, if 

A =- \ a, ■ 


atfi I’ 


c = 


0 


^iat • 



D = 


^<0 


^ia 



show that the vector correlation coefficient K defined by 

K 

and the square of the vector alienation coefficient Z defined by 

D 


Z = 


AB 


are invariant under linear transformations of the variate. Also that 

X = ± Pi p* . . , pp 

Z = (1 -p?)(i ~p 1) ... (1 -p|) 


where the p’s are canonical correlations. 


(Hotelling, 19366.) 


28.9. In the notation of the previous exercise, h and z being the sample values of 
K and Z, show that if the population canonical correlations are all distinct, 


var k = 



Pi J 


var s 


i-1 


cov {k, z) —^KZ ^{1 — P?). 

" i=i 

In particular, when p = 2, 

var fe = i { (1 - X*)« - Z (1 -h iC*) } 
n 

varz= — (1 -Z-l- X*) 
u 

cov (*, s) = - \kZ {1+Z- X*). 


(Hotelling, 19366.) 
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28.10. In the previous exercise, with P 9 — 2, show, ttukt, in standard innasore, 

*. — •'m 

and hence derive a test of significance of the “ tetrad difference ” rx* rtx — ri 4 Tm. 

(Hotelling, 19366.) 

28.11. In the notation of Exercise 28.9, show that 




+ 2 ) 5 - 


(Girshik, 1939.) 


28.12. Find the oharaoteristio function of — logs;, where z is defined as in the 
previous exercise, and hence show that — n logz or, to a better approximation, 

l)}log z tends to be distributed as pq degrees of freedom 

when n is large. 

(Bartlett, 1938c.) 



CHAPTER 29 


y 


TIME-SERIES— (1) 


29.1. A time^series, as its name indicates, is a series of values assumed by a variable 
at different points of time. We shall consider only cases where the variable is univariate 
a nd shall denote its value at time t by The study of such series forms an important 
brahch of statistics because the majority of types of time-variation encountered in practice 
are not of the regular functional type in which can be represented exactly by a mathe- 
matical function of t, but present in some degree those irregularities of a random character 
which can only be discussed in terms of probability. One of our main problems, in fact/ 
will be to isolate systematic from casual effects in the series so as to be able to study 
them separately. 


29.2. In general it is possible to observe a time- variable at any instant, and thus 
the temporal intervals between successive members of the series nee H nnt ha the 
Practice and theory alike,^ however, usually require the observations to occur at regular 
intervals, and in the sequel we shall assume, unless the contrary is specifically stated, that 
the interval from each observation to the next is the same throughout the series. As 
a matter of convenience we may take this interval as our time-unit and write the series as 


Ui, U2, ... (29.1). 

where t must be an intege r. Where a series extends backwards and forwards from some 
given point which we wish to regard as origin we may write it as 

. , . Uo, Ui, Ui, ... Uf, .. . . . (29.2) 

In this chapter and the next we shall study the way in which varies with t, such variation 
being in general of the stochastic type, that is to say, involving random variables. 


Same Examples of Time-series 

29.3. Tables 29.1 to 29.5 provide some examples of the kind of variation encountered 
in practice. Table 29.1 tiUustrated in Fig. 29.1) gives the annual yields per acre of barley 
in Enj^nd and Wales from 1884 to 1939. Table 29.2 (Fig. 29.2) shows the human popula- 
tion of England and Wales at ten-yearly intervals from 1811 to 1931. Table 29.3 (Fig. 29.3) 
gives the sheep population of England and Wales for each year from 1867 to 1939. 
Table 29.4 (Fig. 29.4) gives the annual rainfall in London for each year from 1813 to 1912. 
Table 29.6 (Fig. 29.6) gives the average egg-production per laying hen in the U.S.A./for 
each month of the years 1938 to 1940. 
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TABLE 29.1 


Annual Yields per Acre of Barley in England, and Wales from 1884 to 1939. 

(Data from the Agricvliural StaUaUes.) 


Year. 

Yield per 
acre (c'v^.). 

Year. 

Yield per 
acre (cwta.). 

Year. 

Yield per 
acre (cwts.). 

Year. 

Yield per 
acre (cwts.). 

1884 

16-2 

1898 

16*9 

1912 

14*2 

1926 

16*0 

85 

16-9 

99 

16*4 

13 

16*8 

27 

16*4 

86 

15*3 

1900 

14*9 

14 

16*7 

28 

17*2 

87 

14*9 

01 

14*5 

15 

14*1 

29 

17*8 

88 

15-7 

02 

16*6 

16 

14*8 

30 

14*4 

89 

161 

03 

16*1 

17 

14*4 

31 

15*0 

90 

16-7 

04 

14*6 

18 

15*6 

32 

16*0 

91 

16-3 

06 

16*0 

19 

13*9 

33 

16*8 

92 

16*6 

06 

16*8 

20 

14*7 

34 

16 9 1 

93 

13*3 

07 

16*8 

21 

14*3 

35 

16*6 1 

94 

16-6 

08 

16*5 

22 

14*0 

36 

16*2 , 

96 

160 

09 

17*3 

23 

14*5 

37 

14*0 1 

96 

15*9 

10 

16*5 

24 

15 4 

38 

18*1 

97 

16*6 

11 

15 6 

25 

15 3 

39 

17*6 

^ 








_ 



Years 

Fio. 20.1.— Ch^ih of the Data of TaUe 20.1 (Boitoy Tieldi p«r Aom). 


1960 
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TABLE 29.2 

PopuUxlion of England and Wales at Ten-Tecady Interudls from 1811 to 1931. 

(Data firom the Begistror-General’s Statioiedl Beuieuf, 1933, Part 11.) 


i 

Year. ] 

Population 

(millions). 

1811 

1016 

21 

1200 

31 

13*90 

41 

16-91 

61 

17-93 

61 

20-07 

71 

j 22-71 

81 

26-97 

91 

1 2900 

1901 

' 32-63 

11 

i 36-07 

21 

; 37-89 

31 

! 39-96 



t81t 1831 1851 mi 1891 1911 1931 


Years. 

Fra. 29.2. — Graph of the Data of Table 29.2 (Population of England and Watoe). 




TIME-SERIES 
TABLE 29.3 


SAeep Population 0 / England and Wales for each, Tear from 1S67 to 1989. 


(Data from the Agrundtural StaUatiea.) 


Year. 

Popiilation 

Year. 

Population 

Year. 

Population 

Year. 

Population 

(10,000). 

(10,000). 

(10,000). 

(10,000). 

1867 

2203 

1886 

1892 

1905 

1823 

1924 

1484 

68 

2360 

87 

1919 

06 

1843 

25 

1697 ! 

69 

2264 

88 

1863 

07 

1880 

26 

1686 

70 

2166 

89 

1868 

08 

1968 

27 

1707 1 

71 

2024 

90 

1991 

. 09 

2029 

28 

1640 ! 

72 

2078 

91 

2111 

10 

1996 

29 

1611 

73 

2214 

92 

2119 

11 

1933 

30 

1632 

74 

2292 

93 

1991 

12 

1805 

31 

1776 

76 

2207 

94 

1859 

13 

1713 

32 

1850 

76 

2119 

95 

1856 

14 

1726 

33 

1809 

77 

2119 

96 

1924 

15 

1762 

34 

1653 

78 

2137 

97 

1892 

16 

1795 

35 

1648 

79 

2132 

98 

1916 

17 

1717 

36 

1665 

80 

1966 

99 

1968 

18 

1648 

37 

1627 

81 

1786 

1900 

1928 

19 

1512 

38 

1791 

82 

1747 

01 

1898 

20 

1338 

39 

1797 

83 

1818 

02 

1850 

21 

1383 



84 j 

1909 

03 

1841 

22 

1344 



85 

1968 

04 

1824 

23 i 

1 

1 1384 

1 

1 





. Vm. S9.S.— Gn^ ot the Data of Table 29.8 (Sheep Fopolation). 
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TABLE 29.4 


TokA AnmuA SaivfaU <U London in Inches, for eadh Year from 1813 to 1912. 

(Data from D. Brunt, PhU. Trams. A, 225, 247, 1026.) 


Year. 

^Bainfall 

(inches). 

Year. 

Rainfall 

(inches). 

Year. 

Rainfall 

(inches). 

Year. 

Rainfall 

(inches). 

1813 

23-6« 

1838 

21*63 

1863 

21*59 

1888 

27*74 

14 

2607 

39 

27*49 

64 

16*93 

89 

23-85 

16 

21-86 

40 

19*43 

65 

29*48 

90 

21*23 

16 

31-24(C( 

41 

31*13 


31*60 

91 

28*15 

17 

23*65 

42 

2309 

67 

26*25 ' 

92 

22*61 

18 

23-88 

43 

25*85 

68 

23*40 

93 

19*80 

19 

26*41 

44 

22-65 

69 

25*42 

94 

27*94 

20 

22-67 

46 

22*76 

70 

21*32 

95 

21*47 

21 

31-60 9 

46 

26*36 

71 

25*02 

96 

23*52 

22 

23-86 

47 

17*70 

72 

33-86 

97 

22*86 

23 

24-11 

48 

29*81 

73 

22*67 

98 

17*69 

24 

32-43 & 

49 

22-93 

74 

18*82 

99 

22*54 

25 

23-26 

50 

19*22 

76 

28*44 

1900 

23*28 

26 

22-57 

51 

20*63 

76 

26*16 

01 

22*17 

27 

23-00 

52 

36*34 2 . 

77 

28*17 

02 

20*84 

28 

27-88 

53 

25*89 

78 

34*08 % 

03 

38*10 \ 

29 

25-32 

54 

18*65 

79 

33*82 ^ 

04 

20*65 

30 

26-08 

55 

23*06 


30*28 

05 

22*97 

31 

27-76 

56 

22*21 

81 

27*92 

06 

24*26 

32 

10-82 

57 

22*18 

82 

27*14 

07 

23*01 

33 

24-78 

58 

18*77 

83 

24*40 

08 

23*67 

34 

20-12 

59 

28*21 

84 

20*35 

09 

26*75 

35 

24-34 

60 

32*24 rt 

85 

26*64 

10 

25*36 

36 

27-42 

61 

22*27 

86 

27*01 

11 

24*79 

37 

10-44 

62 

27*67 

87 

19*21 

12 

27*88 
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TABLE 29.5 

Avemge Number of Eggs per Laying Hen in ihe U.8,A,for each Month of the Years 1938-1940^ 

(Data from Report of the Bureau of Agricultural Economics, U.S. Dept, of Agriculture, on the 

Poultry and Egg Situation, March, 1941.) 


Year. 

Jan. 

Feb. 

Mar. 

Apr. 

May. 

June. 

July. 

Aug. 

Sept. 

Oct. 

Nov. 

Dec. 

1938 

7*9 

9*9 

15*4 

17-6 

17-3 

14*9 

136 

11-8 

9-4 

7*6 

5*9 

6*4 

1939 

80 

9*7 

14-9 

170 

170 

14*6 

13*2 

11*7 

9*3 

7*4 

6*0 

6-8 

1940 

7-2 

90 

14*4 

16*5 

170 

14-8 

13*4 

11-8 

9-7 

7-9 

6-2 

6*8 



These series are fairly typical of the kind of material with which our theory has to 
deal. The data of Table 29.1 (barley yields) present a very irregular fluctuation, and so 
far as the eye can see (which is not a decisive test) there is no systematic oscillation and no 
regular movement in mean yields over the period. By contrast, Table 29.2 (human popula- 
tion) shows a relatively smooth movement without apparent oscillation. Table 29.3 (^eep 
population) combines a general decline in numbers with marked oscillatoi^ effects which, 
though not perfectly regular, appear to be systematic to some extent. Tables 29.4 and 
29.5 exhibit an oscillatory effect which is definitely seasonal for the latter and much less 
regular for the former, neither indicating a variation, in the periods covered, of the average 
values about which the series oscillate. 

^ 29.4. It must not be overlooked that our method of determining the values of the 
eeries at fixed equal intervals of time may suppress evidenoQ of osoillatoxy,iuovements 
which have a period equal to those intervals or td some sUb-muli^le of them. Suppose, 
for insttooe, that there vras a systematio oscillation in the Su^h pcpuht^on ekpressible 
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by a harmonic component with period of exactly 10 years, or exactly 6 years, or exactly 
SJ- years. Clearly, by observing the series at 10-yearly intervals we should never find any 
evidence of this effect, for it would contribute exactly the same amount to each observation, 
without oscillation. In the population case, of course, we have collateral evidence to 
indicate that no such oscillation exists, but where nothing is known of the series otherwise 
we can never exclude the possibility of a period exactly equivalent to our time-interval. 
Sometimes, in fact, we know that it is there, and choose our interval so as to exclude the 
oscillation from consideration. For instance, in our sheep population we know that there 
is a seasonal effect within the year, which is not brought out in Table 29.2 because the 
sheep census is taken on June 4th each year ; and again, in the rainfall data of Table 29.4 
we have taken as representing the year the whole rainfall within the year, knowing quite 
well that rainfall is seasonal to some extent, even in London. 

29.6. A general survey of these and similar series suggests that \the typical time- 
series may be regarded as composed of three parts : — 

(а) a trend, or long-term movement 

(б) an oscillation about the trend of greater or less regularity ; 

(c) a “ random ”, ” irregular ” or “ unsystematic ” component. 

It is customary to regard the scries as composed of these elements superposed one on 
another ; that is to say, we consider the movement of the series as the sum of three dif- 
ferent components which may be generated by different causal systems.^ Particular series, 
of course, need not exhibit them all. That of Table 29.2 (human population) seems 
to be almost entirely trend, with perhaps a small unsystematic residual, whereas that of 
Table 29.6 (egg production) appears to be entirely oscillatory, and very regularly so. 
But some series at least exhibit all three. 


29.6. C,T he primary problem of time-series analysis from the statistical viewpoint iip 
t o^isolate the three factors for individual stu d^jl and in this chapter and the next we shall 
be mainly concerned with various methods of carrying out the necessary analysis. Before 
proceeding, however, we must look a little more closely into the reality of the effects which 
we are investigating and the basis on which we assume that the analysis is legitimate. 

*^9.7. (perhaps the easiest component to understand and to remove from the series 


is the sefiaonal effect This is a fluctuation imposed on the series by a cyclic phenomenon 
external to the main body of causal influences at work upon it^ The oscillation in egg- 
produotion in Table 29.6, for instance, reflects the rhythm in the reproductive process 
which is found among birds in virtue, ultimately, of the fact that the earth goes round 
the sun once a year. (Strictly speaking, we ought to confine the word “ seasonal ” to those 
effects which are annual iq period ; but where no confusion is likely to arise we can apply 
the same word and the same ideas to any phenomenon generated by strictly periodic natural 
processes, *)such as ‘‘ spring ” and neap ” variation in tides or daily variation in tempera- 
ture. (JS^e must, however, be careful about extending the notion of seasonality to phenomena 
which a)^ not demonstrated beyond reasonable doubt to depend on strictly periodic stimuli^ 
For histaiijberit would be going too far, in the present state of our knowledge, to speak of 
su^ppt ivarlation as seasonal in this sense, and much too far to speak of seasonality in 
cr^r^^lde as detflbnined by sunspots, even if the relation between the two were estab- 
lish^^^^:,^;!^ to this point below when defining what we mean by a “ cycle ” 

i: BB • 
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'^9.8. As we noted in 29.4, the seasonal effect may already be removed from the 
series by the way in which the data are specified. Where we ourselves have any choice 
in the determination of the data, we may eliminate seasonality in the same way, namely, 
by selecting for measurement of the series a point of time which is fixed in relation to the 
year, such as June 4th for the agricultural returns of England and Wales, or by averaging 
over the year, or (what is much the same thing) by cumulating the series over the year, 
as for instance with rainfall data. 


29.9. l,The concept of trend is more difficult to define. Generally, one thinks of it 
as a smooth broad motion of the system over a long term of years, but “ long ” in this con- 
nection is a relative term, and what is long for one purpose may be short for another.^ For 
example, if we were examining rainfall records over a hundred years a slow rise from the 
beginning of the period to the end would be regarded as a trend ; but if we possessed records 
for two thousand years (and the rings in some of the giant redwood trees give an index of 
climatic conditions for periods of this order) the rise over a particular century might appear 
as part of a slow oscillatory movement, so that any inference from the “ trend ” in a par- 
ticular century to the effect that the weather was likely to continue becoming wetter and 
wetter might be quite false. What inference we should make in practice would depend 
on what we were trying to do. If we were engineers designing a water-supply system and 
wished to provide against droughts of reasonable extent, we miglit perhaps assume that the 
trend would last as long as our works and proceed accordingly ; but if we were attempting 
to study climatic changes over the face of the earth for geological periods of time we should 
accept the continuance of the trend with the greatest reserve or, more probably, should 
reject it on collateral grounds. 


29.10. However long a series may be, we can never be certain, and often not even 
reasonably sure, that a trend in it is not part of a slow oscillation, except of course when 
the series has terminated (as might, for instance, be the case if we were considering the 
lengths of reigns of the Roman Emperors). l^In speaking of a trend, therefore, wc must 
bear in mind the length of the series to which our statement refersN Perhaps it would be 
more accurate to speak of slow or quick movements rather than oi trend and oscillation, 
but even so the distinction between the two would remain a matter of subjective judgment 
to some extent. 

29.11. ^When seasonal variation and trend have been removed from the data we 
are left with a series which will present, in general, fluctuations of a more or less regular 
kind.^ Fig. 29.1 represents the kind of series we obtain, since it has no components of 
trend or seasonality. The question then arises, is this residual series systematic in the 
sense that its values can be represented as a funej^njof the tihie ? . Or, on the other hand, 
are the values random in the sense that they could occur, in the Sbserved order, /by random 
sampling from a homogeneous population ? Or again, is there some possibility intermediate 
between complete functional variation and complete randomness ? (The search for syste- 
matic effects in residual fluctuation gives rise to several techniques of analysis^ thq object 
of which is to detect whether any part of the, series is subject to law, and therefore predict- 
able, and whether any part is purely haphazard. The former part we shall call systematic, 
and it will be referred to as an oscillation ” (not a cvcle ”, which is Sr veiy ajpedal case 
of an oscillation, as we shall see later). The remainder of the series we shall call the unsys- 
tematic component, and refer to its movements as '' random ”1 When a ^riea id a mixture 
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of oscillation and random movement it will not cause any inconvenience to refer to the 
up-and-down movement generally as fluctuation before we have analysed it into its con- 
stituents ; that is to say, we may speak of fluctuation without prejudice to the possibility 
of detecting oscillatory movements in it. 

In this chapter we study 4rend and random residuals. In the next chapter we shall 
deal with oscillatory and cyclical components. 

29 . 12 . The logician or the economist who wants to be difficult can always maintain 
that, although any series can be separated into our three specified components as a matter 
of mathematical or statistical analysis, the results throw little or no light on the causal 
influences at work to produce the series. To such a critic we have to concede, I think, 
that in carrying out the analysis we have at the back of our minds the strong possibility 
that the three elements arc due to independent causal systems. If he refuses to accept 
this view — and some economists do we can only invite him to produce a better statistical 
method. 

Possibly the readijr will feel, on reaching the end of (Jhapter 30, that we have? not been 
wasting our time, and that our methods do throw liglit on the way in which time-series 
behave. If not, he should consult some of the referenc^js and see whether he finds them 
statistically more satisfying. 

Determination of Trend 

29 . 13 . Ijt is an essential part of the concept of trend that the movement over fairly 
long periods is smooth. This means that wc can represent the trend component, at least 
locally, by a polynomial in the time element t. Thus, given the series we may, in the 
first instance, seek for some polynomial 

Ut - flo f a,t\-a^t^-\-.,,+a^,t». . . . (29.3) 

which will give an account of the trend movement. By taking p great enough we can, of 
course, obtain as close a representation as we like to a finite series ; and how large we 
take p is a matter for decision in particular cases. 

If the polynomial is fitted to the whole series by least squares, it evidently gives the 
curvilinear regression line of on the variable This method would then lead to the 
fitting of regressions in the manner of Chapter 22, and we need not repeat here what has 
been said on the subject in that chapter. In Example 22.7 we did, in fact, fit a quartic 
to the population data of Table 29.2 and found a good fit. 

29 . 14 . It is, however, clear that to obtain a satisfactory trend-curve for data such 

as that of Table 29.3 (sheep population), we should have to take a polynomial of rather 
high order. This may appear somewhat artificial and in any case the coefficients of such 
a polynomial, being based on high-order moments, would be very unstable from the sampling 
viewpoint. A more practical objection, though by no means an unimportant one, is that 
if we add another term to the series, as for example if we are keeping an annual series up 
to date fn>m year to year, 4he. work of fitting has to be done afresh each time. Moreover,, 
the trend-line may be ij|Kdte<l t^^ length. When, tfierefore, the series has no^ 

vei^ obvious trend isuch as that of Table 29.2 it is more convenient to use the simpler 

below. 
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M(mng Averages 

29 . 15 . (^An alternative to finding a polynomial which will represent the whole series 
is to determine a polynomial which will represent a part of it, and to use different poly- 
nomials for different parts. The simplest method, and one which forms the basis of the 
majority of methods of trend fitting, is to take the first m terms (m being chosfin at will),' 
fit a polynomial of order p, not greater than m — 1, to them, and use that polynomial to 
determine the value in the middle of its range ; then to repeat the operation with the m 
terms from the second to the (m \- l)th, and so on, moving on one term at each stage. 
Unless other considerations require it, we take m to be odd, so that the middle point of 
the range corresponds to a value which is actually observed!N Otherwise the middle point 
falls half-way between two observed values, or we have to use some value of the fitted 
polynomial other than the middle point, which results in a loss of useful symmetry. 

29 . 16 . (Suppose, then, that the number of terms is chosen to be odd and is denoted, 

with a slight change of notation, by 2m f 1. Without loss of generality we may denote 
the terms by a . . . Wo» • • • choose to fit to them a poly- 

nomial of the pth order (29.3) we may, in the usual way, determine the coefficients by 
least squares, i.e. solve the equations 


da («<-««-•- - 0, j ^ it ... p . . (29.4) 

which will give us equations typified by 

£ {P u,) - ao H {V) T («>+*') = 0. . . (29.5) 

Now the sums £ (P) are functions of m only. Thus, if we solve (29.5) for a, we shall find 
an equation of the form 

a. = c« + c, M_„ + c, tt + Ci^+i u„, . . (29.6) 

where the c’s depend on m and p, but not on the u’b. 

Now «, assumes the value o, at < = 0 and hence this value, as given by (29.6), is the 
value we require for the polynomial. As wo see, this is equivalent to a weighted average 
of the observed values, the weights being independent of which part of the series is taken. 
Thus our process of fitting a trend-line consists of determining the constants c (which 
depend on m and p and therefore give us a twofold element of choice) and then calcnlatSng, 
for each consecutive set of (2m + 1) terms in the sdries, a v^ue given by (29.6). If the 
terms are calculated value will correspond to m tHiere will 

be no values corresponding to the m terms at the beginniitg and the m teanns at the end.'' 


BxampU 29.1 

Suppose we have a series and wish to fit a curve which best appnnpiixn^ ie iets of 
eevmi points ; and suppose we regard a cubic as providing a satisfiioliicjr 6|^pppiXil)}a4ilon. 
/What are the weigl^df the moving average ? > ^ 

Wo have m =JfsxA p ** 3, and c' r pcl;^mi al 1%^ 


» a, t "b 




7 
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Taking our origin at < = 0, we find, for equations (29.6), in virtue of the fact that i: («*) = 0 


for opd k. 

/ 

£(u) = 7ao 

28flt2 




2" {tu) — 

28ai 

+ 1960, 



S (thi) == 28ao 
2 («»«) = 

-|“ lOfiUg 

196ai 

+ 1588a, 

giving, fer o„ 

1 





o, = i-{7i;(«) } 


We may Tvrite this conveniently as 

[-2, 3, 6, 7, 6. 3, -2] y" 

or, when symmetrical formulae are used, as in the present case, by 

[-2, 3, 6, 7 . . . 1, 

denoting the middle term by heavy type. 

To take a simple illustration. Suppose the series is given by the following values : — 

< ; 1 2 3 i ^ 5 l( 7. 8 9 10 

u,i 0 1 8 27'^ 64 126 'J 216 343 6l2 729 

We have, for the trend value at < — 4, * * 

o.-^j{(-2x0) + (3x]) + (6x8)4-(7x27) + (6x64) + (3x125)-(2x216) }= ^-{667} 

== 27. 

Similarly, at < = 6 we find 

a. = 2 ^ ^ + • • • ^ } 

= 126. 

In both cases the trend- value is equal to the actual value of the series, and this obviously 
must be so when we note that we are fitting "It cubic to the series 

Uf == {t — 1 )®. ^ - 

It will be observed that in this example we should have obtained the same value for 
Uo if We fitted quadraticfs instead of cubics ; and generally the case p odd includes the 
case of the next lowest (even) value of p^ so that we need not give separate formulae for 
even p. 

. Writing M for the value of calculated in the above manner for an average 

of ternp^^ we find the following formulae up to p = 5. The reader may care 

to mmself as an exercise. 
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Quadratic and CuMc 



o, [6] 


17, . 

• .] 





[7] 

-21 

8, 7, 

• •] 





[9] 


39, 

54, 59, ... ] 



,1 


[11] 


44, 69, 84, 89, ... ] 





[13] 

4-3 

9, 16 

, 21, 24, 25, ... ] 



■ 


[15] 

iT05^“-’«’ - 

13, 42, 87, 122, 147, 162, 167, 

. . . 

] 



[17] 

- 

6, 7, 

18, 27, 34, 39, 42, 43, . . 

.] 




[19] 

2261 ' '“• 

-51, 

24, 89, 144, 189, 224, 249, 

264, 

269, . . 

.] 


[21] 

3-49 t- - 

- 76, 

9, 84, 149, 204, 249, 284, 309, 324 

329, . . 

•] 





Quartic and Quintic 




[7] 

1 

231 

[5, - 30, 75, 

131, . 

■ •] 




[9] 

1 

429 

[15, - 55, 30, 

135, 

179, . . . ] 




[11] 

1 

429 

[18, -4.5, - 

10, 60, 120, 143, . . . ] 




[13] 

1 

2431 

[110, - 198, 

- 135 

no, 390, 600, 677, . . . ] 




[15] 

1 

46,189 

[2145, - 2860, 

- 2937, - 165, 37.55, 7500, 10,125, 11,063, . . 

•] ■ 

[17] 

1 

4199 

[195, - 195, - 

- 260, 

- 117, 136, 416, 660, 826, 

883, 

• 

.] 

[19] 

1 

7429 

[340, - 255, - 

- 420, 

- 290, 18,405, 790, 1110, 1320, 

1393, . . 

.] 

[21] 

1 

260,015 

[11,628, - 6460, - 

13,006, - 11,220, - 3940, 

6378, 

17,666, 



28,190, 36,660, 42,120, -4,0Ud« . . ] 


(29.7) 


(29.8) 


29.18. Several methods have been proposed to simplify the arithmeiio of fitting 
a trend-line by moving averages, the large numbers in some of the expressiofis in (29.7) 
and (29.8) involving considerable labour in straightforward application. Th^ E^plest, 
perhaps, is that of iterated averages. 

Suppose we take an average of sets of four with equal weights — a vny simple process 

r 
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— and then another average of the same kind of tJiat average. If the primary series is 
the reSillt of the first operation will be to give a series 

== i 4 - u. + %3 + u^) 

4 

3 . -- i (Ua + Wa + 4- Its), etc., 

4 

and that of the second operation to give 

i (^4 1' I 1- 

= i4j • • (29.9) 

We may write this symbolically as 

1, i, Ijj' I^\\, 2, 3, 4 ... ], . . . (29.10) 

or, rc'serviiig the symbol - | k] for a simple arithmetic mc'an of terms, as 

[4]=-’ -- fl, 2, 3, 4 . . .] (29.11) 

Now" compare the weights of the average derived in Example 29.1 for fitting a cubic 
to seven points. Reducer? to unit divisors we have for the weights of the latter 

- 0 0952, 0 1 429, 0*2857, 0*3333 ... 
and for the weights of (29.9) 

0*0625, 0 1 250, 0*187.>, 0-2500 ... 

The two are not identical, but they follow the same sort cf course and it might be possible 
to regard the latter as an approximation to the former. (We shall derive better approxi- 
mations jjresently, but this will serve? for purposes of illustration.) Now the iterated 
summation resulting in (29.9) is much easier to carry out than the single weighted averaging 
process of Example 29.1. Generally, if we can find averages with simple integral weights, 
preferably unity, which will, in conjunction, give approximations to the more complicated 
weights of a single average, it is usually easier to use the iteration process. 


29 . 19 . In the notation of finite differences, write 



Atlf — — Uf . 

. (29.12) 


= (1 + J) M, 

. (29.13) 


dUt = , - Ui -iyj ■ ■ ■ 

. (29.14) 

We have, for the second 

“ central ” difference 



(«,+, - Ut) ~ {w, 

= (£! -2 + JS-^) Uf ... 

. (29.15) 

Writing 

E — exp (2i^) .... 

. (29.16) 

we find, symbolically, 

^ E - 2 + E-' j 

— exp (2*^) 4- exp ( — 2i^) — 2 ^ c- ^ * 

• £ ' ■. 


= — 4 sin* <ft. . . . . ' . 

; ' . (29.17) 


i 
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Then 


^ (««) = ^ (-S' w«) 

i^—m ^ 

r ^ ^ 

= i 1 + 2 ^ (cos 2j(f>) 


t^-ni 


Since 


the terms in sin 2j5^vanish, 


Uo 


Thus 


^ sin (2m + 1)^^ 
sin <f} 


1 1 sin kS 


k{ 3 ! 


■7i 




>s 

S' 4;- 

4. 


4- 


. (29.18) 


sin* ^ 


, Jfc* - 1 

““+223! 


-f 


2“ 5! 


- ]*)(** - 3*) . , , 

1 

— sm“ <f> ■ . 

. . ^ Mo 

6“mo + . . . 

. (29.19) 


This interesting formula gives the arithmetic average in terms of the inhhlle term u„ and 
its central differences. 

If now our series is approximately repr esented b y a cubic, so that fourth differences 
vanish, we have “ ' 

J [k] tto - «0 + ** ~ ’ (29.20) 

and this equation will in any case be true up to ‘ \ii*d differences. Similarly, for two iterated 
averages we have, to the same order, * 

, r^ij'L^*] Mo ^ Mo 1- 24 (*! + *2 — 2) d^Ua . . . (29.21) 

and so on. We will use these results to derive two formulae in very general use by actuaries 
for “ graduating ” a series, a process which is very similar to that of fitting a trend -line. 


Example 29,2, Spencer's 15 -point Formula 

Consider three succesosive averages with equal weights 

W W [5] Mo = w, + 2 ^ {4* - 1 + 4* - 1 + 5* - 1 } 

= Mo + ^ Mo. 

4 4 

We then have, to third differences 


= “I**)"*’ 

Substituting for the formula [1, — 2, 1], as given by (29.16), we find 


u. 


320 


9. 22, - 9]. 


I ^ or higher 

^ ^ tq toenfe exteatv 
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add to the factor [—9, 2^, — 9] a term 
is [—3, 3, 4, 3, — 3], giving 

= ~ [4]* [5] [ 


320' 


33^ = [— 3, 12, *— 18^ 12, — 3]. The result 

.y * 

3, 3, 4» • • .]• I 


This is Spencer’s 15-point formula, 
in full being 



3, 


- 6 , - 


It covers sets of 15 consecutive terms, the weights 
5, .3, 21, 46, 67, 74, . . . ] , / 


Example 29,3. Spencer's 21-point Formula 
In a similar way we find 

~ [5]^ [7J = 1 + 4,5^ 

1 iO 

giving, to third differences, 

U, = [5]* [7] (1 - 4<5‘^) u, 

-~[5Pm[-4. 9. -4 ]m.. 

We now add to the factor 4, 9, 4] the expression 

- 3<5" W - 3, 12, - 18, 12, 31 + [ - I 3, 7^, 10, 

giving 

^'0 = \r>Y [7J [ ■ I 0, h 1, i 0, - 

gjy rs? m [ - 1, 0, 1, 2, .. .1. 

This is Spencer’s 21-point formula. 

29 . 20 . (a few practical points arising in the application of the foregoing formulae 
are worth mentioning. 

(а) The order in which the iterations are carried out is of course immaterial, as the 
reader can easily verify. It is therefore more convenient, as a rule, to carry out the more 
complicated operations first, while the numbers being handled remain small.^ For instance, 
in applying the Spencer 15-point formula we should carry out the moving average 
[— 3, 3, 4, 3, — 3] first, then apply the simple average [5J, and then the two averages 
of four. This does not apply if the series is short, inasmuch as there are fewer of the final 
than of the initial operations. 

(б) ^rhe use of a moving average of extent 2k -f- 1 involves the absence of k terms at 
the end and k terms at the beginning of the trend-series. If the original series is short the 
loss may be serious, and this effect sometimes restricts considerably the extent of the 
average which we are able to apply. 

(c) It is possible to remedy the deficiency at the ends of the series by special formulae, 
but the values so derived have less reliability than those of the main trend-line, and on . 
the. whole it seems better to accept the loss of 2k terms unless trend-values for the beginning 
and end of the series are really essential. 


- 7|, 3, - i] 
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{d) As yet we have given no guide as to the choice of most suitable values of m and p. 
In practice we do not usually require to fit curves of degree higher than five, and often 
a cubic is sufficient^as is assumed in the Spencer formulae. (There is greater elasticity in 
the choice of m, but the point mentioned in (6) above requires m to be as small as possible, 
consistent with other requirements.^ We shall see later in the chapter that the variate- 
diflference method gives some further guide as to p, and that certain effects of trend-elimina- 
tion on random elements bear on the extent determined by m, 

(e) There is a voluminous literature on trend-fitting which appears to me out of pro- 
portion to the importance of the subject. It is not difficult to pursue inquiries on the 
above lines to the point of extreme apparent precision and great mathematical complexity, 
and perhaps such work is valuable where the series is fairly smooth and not disturbed 
seriously by sampling variation or superposed random fluctuation. But many of the 
series encountered in statistical practice will not bear the weight of great refinement in 
trend-fitting. The student will probably find that a knowledge of fitting by moving 
averages will be sufficient for all ordinary and many extra-ordinary purposes. 


TAe Effect of Trend-elimination on Other Components 

29 . 21 . In Table 29.6 we have applied the Spencer 21-point formula to an artificial 
series obtained by adding a random element to a cubic. Specifically, 


26) + 


(t - 26)3 


(29.22) 


The component was taken from tables of random numbers and consists of samples from 
a populatio*! in which all integral values from 0 to 99 are equally frequent. The various 
coludiiS of the tAble illustrate the process of fitting, and we may note in passing that for 
aperies as short as) tliis it is convenient to leave the more difficult summations to the last 
there are substanbiallj” of them. 

' Now we know that the Spencer lormula will fit a cubic exactly, so that when we sub- 
tract the trend from the original series we ought to eliminate the systematic constituent 
entirely and be left with! our random component, except in so far as we have rounded off the 
systematic element to the nearest unit. A comparison of columns (2) and (9) in Table 29.6, 
remembering that the latter includes an element 49*5 equal to the mean of the random 
component, shows that we do not do so. The reason is not far to seek.’ The moving 
average has acted on the random element itself and determined a trend-line in it. 

The results of applying the Spencer 21 -point formula to the random element are 
shown in column (11). We should expect that if the method were perfect the values in 
this column would be 49*5, the mean of apart from irregular sampling effects ; but 
not only do the observed values deviate from this mean, they do so systematically, the 
values having a small oscillatory movement which is shown as part of the tren^. 

\J^2932. This effect can assume considerable importance, particularly if we arc elimina- 
tin^trend so as to concentrate attention on oscillations. We proceed to examine it more 
closely. 

Q Suppose that we have a series composed of the sum of three parts, a trend (t), an 
oscillatory term ^2 (t), and a random element (t), so that 

Uf = - 4 * <f>t " 4 ; ..... (29.23) 
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TABLE 29.6 

Series given by Equation (29.22) with Trend-Line determined by a Spencer 21-point Formida. 



1 

! 

! 



i 

1 


1 

- - 

(1) 

(2) i 

Cubic j 

(3) 1 

1 

w 1 

(5) ; 

(«) 

(7) ; 

; [ 

(8) 

1, 0. 1,; 

(9) 

(10) 

Deviation 

(11) 

Graduation 

t 

Term. ! 

1 


tit 1 

i 

[5] 

[6] (5). 

17| (6).|2 

• • ] (7).| 

:>Jo (8). 

Ut - (9). 

of £f alone. 

1 

-119 

23 

-96 i 








2 

-106 i 

15 

-90 1 

. • . 






... 1 

3 

- 92 : 

75 

-17 : 

-246 

• • « 


• • • 



... 1 

4 

- 80 ; 

48 

-32 

-209 






! 

1 5 

“ 70 ! 

59 

-11 j 

- 87 

-572 




... 

... i 

; 6 

- 60 ' 

1 

-59 1 

- 42 

-241 

... 




i 

7 

- 51 ; 

83 

32 

12 

162 





... 

8 

- 44 ! 

72 

28 

85 

413 

2,233 ! 



... 

! 

! 9 

- 37 

50 

22 

194 

670 

3,801 i 

... 



1 

10 

- 31 

93 

62 , 

164 

844 

5,120 ; 

... i 




i 11 

- 26 

76 

50 ; 

215 

957 

5,984 ^ 

14,352 ! 

41 

» i 

67 

; 12 

22 

24 

2 1 

186 

996 

6,642 , 

15,470 

44 

-.42 

66 

! 13 

- 18 

97 

79 ' 

198 

1,078 

7,041 

15,815 

45 

34 

63 

> 14 

- 15 

8 

_ 7 : 

233 

1,026 

7,145 

15,676 

^ 45 

-52 

60 

! 15 

- 12 

86 

74 

246 

1,071 

7,038 ' 

14,978 

43 

31 

55 

16 

-- 10 

95 

85 

163 

1,069 

6,934 

14,166 

40 

45 

51 

i 17 

- 8 

23 

15 

231 

948 

6,709 

13,379 

38 

-23 

47 


7 

3 

- 4 

196 

850 

6,535 

12,703 

36 

-40 

43 

19 

- 6 

67 

61 

112 

892 

(i,408 

12,169 

35 

26 

40 

20 

- 5 

44 

39 

148 

853 

6,363 

12,102 

35 

4 

39 

1 21 

4 

5 

1 

205 

852 

6,446 

12,279 

35 

-34 

39 

1 22 

- 3 

54 

51 

192 

944 

6,611 

12,676 

36 

15 

39 

i 23 

- 2 

55 

53 

195 

1,024 

6,769 . 

13,228 

38 

15 

40 

i 24 * 

2 

50 

! 48 ' 

204 

, 1,031 

7,052 

13,857 

40 

8 

41 

i 25 

: - 1 

43 

42 

228 

; 1,015 

7,353 

14,508 

41 

1 

42 

I 26 

0 

10 

! 10 

212 

: 1,050 

■ 7,610 

15,120 

43 

-33 

43 

27 

1 1 I 

74 

75 i 

176 

1,136 

7,923 

15,634 

45 

30 

44 

1 28 

i 2 ' 

35 

1 37 ! 

230 

1,153 

: 8,249 

16,251 

46 

- 9 

44 

i 29 

4 ; 

S 

1 

290 

1,201 

8,607 

17,002 

49 

-37 

45 

j 30 

6 1 

90 

! 96 

245 

1,337 

: 9,019 

17,717 

51 

45 . : 

44 

31 

‘ 9 

61 

1 70 • 

260 

: 1,357 

9.424 

18,499 

53 

; 

44 

32 

12 

18 

! 30 ' 

312 

! 1,373 

9,870 

19,307 

55 

: -25 ; 

43 

33 

15 , 

37 

i 62 ; 

250 

; 1,462 

; 10,429 ; 

20,159 

58 

; - 6 1 

42 

34 

20 

44 

1 64 : 

306 

i 1,541 

' 10,989 ; 

21,133 

60 

4 1 

41 

35 

24 

10 

i 34 

334 

; 1,599 

1 11,679 ' 

22,417 

64 

-30 

39 

36 

30 : 

: 96 

1 126 j 

i 339 

; 1,760 

1 12,539 

23,797 

68 

58 

38 

37 

36 1 

1 22 

: 58 : 

1 370 

' 1,897 

i 13,529 

25,737 

74 

-16 

1 37 

38 

44 ; 

13 

57 ! 

! 411 

2,047 

■ 14,699 : 

27,955 

80 

-23 

1 36 

39 

! 52 i 

43 

95 I 

^ 443 
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If we determine the trend by a moving average, denoted by an operation T, then clearly 

Tu, - + r03 (29.24) 

Let us now suppose that our method of determining trend is perfect in the sense that 
Then, on subtracting (29.24) from (29.23) to eliminate trend, we find 

+ {<!>, . . . (29.25) 

The point of present interest is that the terms and T<f>^ in (29.25) may distort 
the genuinely oscillatory parts of the residual series and induce spurious oscillatory move- 
ments. ^ ^ 

29 * 23 . Consider the simple case when <^2 is a sine term, sin (a + Xt), t being integral. 
Since 


I 

t^l 


^ sin (« + A«) = Bin {a + ^ (i: + 1) A}, . 

Sin 7A 


(29.26) 


a simple moving average of k consecutive terms will result in a sine series of the same 
period and phase as the original, but with the amplitude reduced by the factor 


1 sin 

k sin * 


(29.27) 


Iteration q times will reduce the amplitude by the r/th power of this factor. 

Thus the term T<f >2 will be small if k is large, q is large, or if JfcA is a multiple of TZy 
that is, if the extent of the moving average is a period of the oscillation. Bui if A is small 
and kk is small the amplitude is reduced very little and <^2 — * T<f >2 will largely disappear, 
i.e. the moving average will partially obliterate the term in In this case, kX being 
small, the extent of the moving average is small compared with the period of the harmonic 
term, that is to say the oscillation is a slow one. This result is what we should expect. 
A slow oscillation is treated as a trend by the moving average and eliminated accordingly. 
Generally, the moving average will emphasise the shorter oscillations at the expense of the 
longer ones. Furthermore, if the extent of the average is slightly greater than the period, 
the term (29.27) may have a negative sign, and consequently the difference from the trend 
may somewhat exaggerate the true oscillations.!^' 

It is not so easy to exhibit the precise effect of the moving average when the weights 
are unequal and the terms are not harmonic, but evidently the same kind of situation is 
apt to arise. 

•24.( Now consider the effect of a simple moving average (that is, one with equal 
nmghts) on the residual element ^3 which we will suppose to be a random element with 
For the term we have 


variance v. 




1 ^ 

-Ufc) 


(29.28) 


where [JA] is the greatest integer which does not exceed Consecutive values of are 
independent, but consecutive values of are not ; for T<l>z (a) and T(l>z (5) have 
k — {a -- b) values of e in common and are correlated if a — 6 < Thus jbhe series 
will be much smoother than and if we proceed to further ^versgjngs will become smoother 
still. ) We have had an example of this effect in Table 29.6, and shall n^eet further 
examples below. 
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29.25. effect of taking a mo ving average of a random series will then be to 

generate an oscillatory series, provided weights a^j^iich as to give a positive 

correlation between successive members of the generated series, a condition which is always 
realised in moving averages employed for trend-fitting. Wo Bhaii call this t h e Slutzky- 
Yule effect, after the two statisticia ns who (independently) studied it in detail. 

The generated scries is not regular in the cyclical seiise, tliat is to say ii^ peaks an d 
t roughs do not recur at equal intervals of time, and the amplitudes of the oscillations ^ry 
c onsiderab ly. Nevertheless such oscillations present a striking resemblance to the kind 
of movement which is found in practice, particularly in economic time-series^ and we shall 
consider them in more detail in Chapter 30. For our present jiurposes we require to con- 
sider ha w J[ar the process of trend-elimination itself may generate such effects in order 
^ be sure Uiat oscillatory movements in a treml-free series have not been put there, so 
to speak, by our own arithmetical pioccS^^.^' ; " 


29.26. F 'or this purpose we shall consider the period and variance of a series gen- 
erated by the Slutzky-Yule effect. 

Since the peaks and troughs do not reeur at equal intervals there is no quantity which 
we can conveniently call the length of the oscillation. There will, in fact, be a distribution 
of lengths. We may define as the mean length either tlu^ mean period from peak to peak, 
or that from trough to trough ; but this raises some difficulties as to whether we are pre- 
parc'd to admit as periods small ripples on the main undulation. 

Recognising its somewhat arbitrary character, we shall take as our measure of oscilla- 
tory length the mean distance between u})(?rosses that is to say the mean distance 
between points where the scries changes sign from negative to positive or crosses the 
or-axis Suppose the sei*ies is generated by a moving average with weights ai . . . 
of a random variable which is normally distributed with variance v. Then the probability 
that 


" t = ^ • 

k 


anci 






. (29.29) 


(29.30) 


j-^\ 


i.e. that the generated series changes sign from negative to positive, is the proportional 
frequency of 

kA-\ 








dt y . , , dei 


A + 1 


. (29.31) 


k k 

between the hyperplanes Hj — 0 and ^ ” ff- This is equal to the angle 

j-i 

between these two planes, which is given by 

k- 


cos 6 = - 


•M 


. (29.32) 




Hence the meati distance between uperosses is 2jr/0, where 0 is given by (29.32). 
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29.27. In a similar way, the probability that 

®A;+i — < 0 • • • • • • • (29.33) 

“ «*-i >0. (29.34) 

that is that is a peak of the series, is the angle between the two hyperplanes 


IP A 


. (29.36) 


and is given by 


cos Oj — 


K K 


(as - ffli) a, + (as — a^) (a^ — a^) + . . . 

+ (^k ~ (^k-l “ ®Aj-2) “■ i^k 


. (29.36) 


. (29.37) 


{d[ + (^2 — «i)^ + • • • + 

Thus the mean distance between peaks is ^n/Ox- The same formula obviously applies to 
mean distance between troughs. 


29.28. If we wish to exclude “ ripples ” of a certain length d from consideration 
wo may inquire for the probability that (29.35) and (29.36) are satisfied in conjunction with 

Uk .> ...... (29.38) 

This is evidently the area cut oil* on the unit sphere by the three planes (29.35), (29.36) and 

k k 

• • • • (29.39) 

If the angles between the planes are A, B and C this area is ^4 + + C -- 2;;r = 0^, say. 

The mean length between peaks, ripples excepted, is then ^kn/Oz- 

Example 29.4 

In Table 29.7 we show 480 terms of a*series of random numbers which can take integral 

values from 0 to 19, together with a moving sum of fives of a moving sum of threes. 

Fig. 29.6 shows a portion of the derived series graphically. There are 474 terms of the 
smoothed series. 

The mean value of our series is 15 x 9-5 — 142-5. The number of uperosses will be 
found from the table to be 23, the first between the 19th and 20th term of the smoothed 
series, the last between the 459th and the 460th. The mean distance between uperosses 
is then 440/22 = 20 units. How does this compare with the mean-distance given by 
“ normal ” theory ? 

The weights of the graduation are [1, 2, 3, 3, 3, 2, 1] and from (20.32) we have 

1* -f 2* + . . . +1* 

^4. 

= 0-9189 

37 

6 = 23® 14'. 

Hehce the mean distance = = 15-5 units. 

23-233 


2: 

i.zsl 



TABLE 29.7 

Series of 480 Terms of a Rectangular Random Series e and a [.5] [3] smoothing S. 
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Fio. 29.6. — Graph of the Last 117 Terms of the Series S of Table 29.7. 
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The obst^rved mean distance is 20*0 units, but this is based on rectangular variation, and 
we are, perhaps, entitled to expect some difference from normal theory. For rectangular 
random variables, values distant from the mean occur more frequently, and it is not sur- 
prising to find oscillations in the series which do not result in uperosses. 

The number of peaks in the series will be found to be 62, the first at the seventh term, 

the last at tha 466th. Hence the mean distance between peaks is =7*5 units. From 

61 

formula (29.37) we find 

cos 0, =1 0i = 48° 1 1'. 

6 


Thus the theoretical mean distance is 


360 

48-187 


7-5 units, in good agreement with exi)eri- 


ment. It will be observed that several of the distances between peaks are due to very 
small ripples. 

From a number of experiments Dodd (1939a) concluded that series generated from 
rectangular material conformed fairly well to normal theory. 


29 . 29 . Let us now examine how the variance of the induced oscillation compares 
with the variance of the original random series. 

The sum of k random elements with variance v has variance kv and its mean has 
variance v/k. It does not follow that a simple moving average has a variance 1 /k times 
that of the random element, because of correlations between successive memlxirs in the 
derived series. If the original series was the derived series is, with weights 

Ul . . . 


(li ill f fa } dk 

(I I ^2 r ^2 fa I * • • + 1 ~ 

^v-k+1 + + • • • f/4 = Vn-k+l, 


(20.40) 


The expected value of the sum of these values is zero since the expected value of e may be 
taken to be so. Since there are n — k 1 terms we have for the variance 


1 

n - - k i 




. (29.41) 


The expected value of this, since the e's are independent, is 

^ ^ E {Z (»/*) } = E (jj*) = (ai + ai + . . . Ofc) w. . . (29.42) 


In particular, if the a’s are all equal to 1 /k, the expected value of the variance is v/k. ^Ihis 
gives us the average, reduction in the variance. 

If a simple average of extent k is iterated q times the weights are the successive 
coefficients in 


1 


(1 + a; + ... + 



The sum of squares of these coefficients is the coefficient of in 


'Jfc« 


(1 + a? + + . . . + = 


(1 - a:*^)2« 

¥(l 


. (29.43) 



EFFECT OF TREND-ELIMmATION 


385 


and this gives the average reduced variance for a simple average of k iterated q times. 
The following are the values of the reducing factor for some of the values of k and q : — • 


! 

1 

2 : 

3 

4 

5 

3 i 

0-33 

■ 

0-23 

019 i 

017 

oir> 

4 ! 

0*25 

017 

014 

012 ! 

oil 

^ 1 

0-20 

1 0*14 

Oil 

010 

009 

6 1 

017 

1 011 

009 

0-08 

007 

7 ! 

014 

j 010 

008 1 

007 

00(5 


I 


Evidently the result of the first moving average is to generate a series with a much 
lower variance than that of the original random element, but the second and succeeding 
iterations do not reduce the variance further to the same extent. In the case i; = 7 the 
first averaging reduces the variance to one-seventh, but the next three reduce it only by 
a further half. 

29.30. To apply such results in practice we require an estimate of the variance of 
the random element in the original series. If this is available we can estimate the variance 
of the generated series and also, from 29.26, the mean distance between uperosses or 
between peaks. If then our residual series, after the elimination of trend, showed an oscilla- 
tory movement with this variance and these mean-distances, within sampling limits, we 
could not conclude that the oscillatory effect was real. It could have been induced by 
our method of eliminating trend. 

In the present state of knowledge it is not possible to assign permissible limits of 
sampling variation by relation to standard errors in the usual way. Whether any particular 
effect is significantly different from the values of the series generated from the random 
element remains, therefore, a matter of subjective judgment to some extent. The sampling 
problems involved are formidable, but there does not seem any reason why they should , 
not be capable of explicit solution. I'^his field of study awaits the attention of the theorist. 

I 

Example 29.5 

For the data of Table 29.3 (sheep population of England and Wales) trend was elimi- 
nated by a simple average of nines, the resulting residuals being shown in Table 29.8. 
A glance at the series suggests some sort of oscillatory effect, since the signs of terms cluster 
together. By the methods of the next chapter the effect may be brought into greater 
prominence. The data themselves, however, indicate a mean-distance between uperosses 
of about 8 or 9 years, and actual calculation gives a variance of 8474. Can this be due 
to the operation of our trend-elimination on a random element in the original series ? 

For the mean distance between uperosses due to a simple nine-point average we have 

cos 0 = ? 0 - 27° 16', 

360 

and the mean distance is 2^27 ^ ^ approximately. This is considerably in excess of 

our observed value, but not sufficiently so to reject outright the possibility we are examining. 

Since, however, the variance of residuals is 8474 this must, to have been generated 
from a random series by a simple average of nines, derive from a random element with 
A.S. — VOL. II. c c 
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TABLE 29.8 


Besidtuil Values of the Sheep Series of Table 29.3 after Elimination of Trend by a Simple 

Nine-Point Moving Average. 


Year. 

Residual 

(10,000). 

1 

Year. j 

1 

1 

Residual 

(10,000). 

Year. 

Residual 

(10,000). 

1871 


176 

1893 1 

4 

34 

1915 

4 

19 

72 

— 

112 

94 

— 

103 

16 

4 

128 

73 

+ 

r>o 

95 i 

— 

104 

17 

4 

97 

74 


141 

96 


16 

18 

4 

69 

76 

+ 

60 

97 1 

— 

23 

19 

— 

29 

76 

— 

20 

98 ! 

4 

17 

20 


174 

77 

+ 

12 

99 i 

4 

71 

21 


107 

78 

f 

82 

1900 

4 

36 

22 


142 

79 


130 

01 

4 

16 

23 

... 

109 

80 

— 

14 

02 


27 

24 

— 

23 

81 

... 

166 

03 

— 

32 

25 

4 

60 

82 


179 

04 

— 

49 

26 

4 

121 

83 


84 

05 


61 

27 

4 

94 

84 

+ 

38 

06 

— 

52 

28 

— 

25 

85 

-f- 

97 

07 

-- 

24 

29 

— 

90 

86 

1 4- 

8 

08 

4 

68 

30 

i 

75 

87 

; 

6 

09 


141 

31 

' 4 

72 

88 

i 

106 

10 

4 

119 

32 

1 4 

152 

89 


99 

11 

4 

66 

33 

' 4 

112 

90 

! 4 

35 

12 

— 

52 

34 

— 

64 

91 

i 4 

169 

13 

— 

117 

36 

i — 

87 

92 

i 4- 

167 

14 1 


61 


1 


,266. An 

estimate of the variance 

of the random element in 

the o] 


obtained by the variate-difference method which we describe below, was only 350 approxi- 
mately. Making every allowance for sampling effects, we cannot do otherwise than reject 
decisively the possibility that the residual oscillation is spurious in the sense of having 
been induced into the data by the effect of the elimination of trend on a random element. 

^ \ 29.31 . We may summarise the foregoing discussion of trend-elimination as follows : — 
The conception of a trend as a “ smooth ” or “ regular ” movement is equivalent 
to the supposition that trend can be represented, at least locally, by a smooth mathematical 
function and in particular by a polynomial in the time-variable. 

(6) Certain series can be treated on lines formally equivalent to regression analysis ; 
but a more generally applicable procedure is to represent the trend by a moving 
parabolic arc. 

(c) The moving arc of best fit in the least-squares sense gives values which are deriv- 
able from a moving average of the data. The weights of this average are to some extent 
at choice, according to the extent of the average and the closeness of fit required in the 
moving arc. 

(d) A moving average of extent fc sacrifices (i; — 1) terms, in the sense that the derived 
series is tfc — 1) terms shorter than the original series. If the series is short it is usually 
desirable to keep this loss to a minimum, that is, to keep the extent of the average as 
short as possible. 
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(c) A moving average may distort genuine oscillatory effects, in general exaggerating 
the shorter variations at the expense of the longer ones, a nd may induce spurious oscillatory 
phenomena by its act ion on rando m residuals. For harmonic components the effect is 
minimised oy laidllg Dhe average asTimpleT^th extent equal to the period of the com- 
ponent. For random components the effect is minimised by making the^um of squares 
of weights in the average a minimum, i.e. by using a simple average. / ^ 


( 29.32. In the theory of time-series there are very few rules which can be laid down 
without a good deal of proviso aiid caveat. It will be evident from the foregoing that there 
is no golden rule in trend-fitting which can be applied irrespective of individual circum- 
stances. If wo desire to get a close fit to the data we must use a parabola of fairly high 
order, which involves a moving average with weights which are far from equal. This, 
however, increases the danger of obscuring the true oscillations in the residuals. In 
most practical cases it is necessary to strike a balance between conflicting requirements 
by intuitive judgment as to the appropriate moving average to use. 

Variate-difference Method ^ 

29.33. We now proceed to consider the random constituent of a time-series. From 
the very nature of random variation wc? (iannot (jxpc^ct to derive any formula, however 
approximate, which will measure the random component directly at any given point of 
the series. The best we can hope to do is to determine the non-random components and 
I to obtain a random residual which is left unaccounted for by those components ; and even 
this, as we shall see in the next chapter, is not a very strong hope when oscillations appear 
in the series. 

On certain assumptions, however, we may determine the variance of the random 
component and hence obtain a general idea of its magnitude and importance. Suppose 
that the systematic part of the series can be represented, at least locally, by a polynomial. 
Then successive differencing of the scries will gradually eliminate the polynomial element 
but will not reduce the random chnnent correspondingly. As we proceed with the differ- 
encing, the random element becomes more and more predominant until finally the syste- 
matic component is negligible. Hence we can detcjrmirui effectively the variance of the 
random comx)onent in the differenced series, and by a simple calculation derive an estimate 
of that in the original series. 


29.34.'\ Consider the differencing of a random series We have 
A ~ — Ef , 


I fin r-2 


+ {- ir 


(29.44) 

(29.46) 


Without loss of generality we may sui)pose that the mean value of is zero, and thus 


E (A^ et) - 0 . 


(29.46) 


Hence 


var (A^ Bf) 


E (A^ fe<)2 


e?+, -1- 




e?+r-l + 
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The sum in curly brackets is easily evaluated from the consideration that it is the coefficient 
of af in (1 + a:)’' {x -f I)’’, that is, equals Hence 

var (/!'• e,) = tJ ^ ^ (29.47) 


We may then derive an estimate of v by writing 


V — ‘ 


C) 


(29.48) 


It is to be noticed that we use the second moment about zero, not the observed variance 

of Sf, since the mean is known to be zero. This shortens the arithmetic to some extent. } 

( 2r\ ' * 

The factor f j for r — 1 to 10 has the following values : — 


r 

(^) 

■A?) 

1 

2 

0-5 

2 

6 

0-166,667 

3 

20 

0-05 

4 

70 

0-014,285,7 

5 

252 

0-0*3,968,25 

6 

924 

0-0*1,082,25 

7 

3,432 

0-0»,291,375 

8 

12,870 

0-0*77,700,1 

9 

48,620 

0-0*20,567,7 

10 

184,756 

0-0*5,412,54 


29.35.^ Basing itself on equation (29.48) the method of variate-dilferences proceeds 
as follows : We difference the series .once, find the second moment about zero of the result- 
ant and divide by 2 ; we then difference again and find the second moment about zero, 
dividing in this case by 6 ; and so on. If the successive estimates of v decrease, we con- 
tinue with the differencing. Thel'e will, in general, come a point when they cease decreasing 
and remain constant within sampling limits (which may be rather wide). At this stage 
we may suppose that we have eliminated the systematic element in the original series. 
The final estimate gives us an estimate of the variance of the random element in the original 
series, and the order of the difference to which we have had to go will give an indication 
of the degree of the polynomial representing the systematic component. J 

Example 29,6 

Let us apply the variate-difference technique to the series of Table 29.6. We know 
from the method of constructing the series that the systematic part ought to be completely 
eliminated after the third differencing, and also that the random part consists of an element 
with variance 833 approximately. In fact, the random numbers from 1 to have a 
variance {N^ — 1)/12 and N in this case is 100. The actual variance of the random element 
in Table 29.6 is 843. 
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TABLE 29.9 

Differences of the Series Uf of Table 29.6. 


t ! 

Ut j 

1 

1 

A^. 

A'K ! 

i 

A^. 

AK ! 

r 

,4«. 1 

i 

i 

1 

-96 

- 6 

67 

155 1 

279 ' 

508 1 

1060 

2 

-90 

73 

- 88 

-124 ! 

-229 

- .542 i 

-1297 

3 

-17 

15 

36 

105 

313 

7.55 1 

1524 

4 

-32 

-21 i 

- 69 

-208 

-442 

- 769 

-1141 

5 

-11 

48 * 

139 

234 

327 1 

372 ; 

271 

« 

-59 

- 91 ' 

- 95 

- 93 ' 

- 45 i 

101 

361 

7 i 

32 

4 

- 2 i 

- 48 

- 146 i 

- 260 

- 229 

8 1 

28 ; 

6 

46 1 

98 

114 

31 

625 

9 1 

22 

-40 

— 52 ' 

- 16 

145 1 

594 

1601 

10 j 

62 

12 

- 36 ’ 

-161 

-449 ! 

-1067 

-2252 

11 I 

50 

48 

12.’» 

288 

618 t 

1185 i 

1978 

12 i 

2 

-77 j 

163 

-330 

-567 : 

- 793 

- 876 

13 1 

79 

86 1 

167 

237 

226 ' 

S3 

- 159 

14 

- 7 

81 ! 

- 70 i 

11 

143 

242 

137 

15 

74 

-11 

-- 81 : 

-132 

- 99 

105 

551 

16 

85 

70 

51 ‘ 

- 33 

-204 

- 446 

- 055 ! 

17 

15 

19 

84 i 

171 

242 

209 

- 64 1 

18 

- 4 

65 

- 87 

- 71 

33 

273 

690 1 

19 

61 

22 

- 16 

-104 

-240 

417 

- 629 ! 

20 

39 

38 

88 

136 

177 

212 

216 

21 

1 

50 

--48 i 

41 

35 

4 

175 } 

22 

51 

— 2 

" i 

- 6 

- 31 

- 179 

- 650 

23 

53 

5 

... 1 ! 

25 

148 

471 

1110 

24 

48 

6 

— 26 

-123 

-323 

- 639 

- 975 

25 ' 

42 

32 

97 

200 I 

316 

336 ! 

41 1 

26 j 

10 

65 

-103 

-116 i 

- 20 

295 i 

925 ! 

27 1 

75 

38 

13 i 

96 

-315 

- 630 i 

- 965 

28 

37 

25 

109 

219 

315 

335 j 

207 

29 ! 

12 

-84 

-110 

- 96 

20 ! 

128 1 

316 

30 j 

96 

i 26 

- 14 

- 76 1 

j -148 1 

- 188 

32 

31 

70 1 

1 40 1 

62 

72 

i 40 1 

- 156 ; 

- 798 

i 

32 j 

30 

i -22 1 

10 

32 

I 196 

642 

1597 i 

33 

52 

1 -12 i 

- 42 

-164 

1 '446 

955 j 

j 1719 

34 

64 

30 

122 

282 

' 509 

764 

! 950 1 

35 

34 

j -92 

-160 

-227 

- 255 

- 186 

1 141 1 

36 

126 

; 68 

67 ! 

28 

- 69 

1 - 327 

1 - 991 i 

37 

58 

1 

39 

97 

258 

664 

1 1515 1 

38 

57 

i --38 

- 58 ‘ 

^ 161 

-406 

- 851 

-1492 i 

39 

95 

! 20 

103 

245 

445 

i 641 

1 707 ' 

40 

75 

i -83 

- 142 

- 200 

-196 

66 

I 281 j 

41 

158 

! 59 

58 

4 

130 

- 347 

1 - 685 

42 

99 

1 

62 I 

126 

217 

338 

509 

43 

98 

-61 

-• 64 1 

91 

-121 

j - 171 

- 314 

44 

159 

3 

27 1 

30 

50 

I 143 

432 

45 

156 

-24 

i - 3 i 

- 20 

- 93 

1 - 289 

- 745 1 

1 

46 

180 

i^21 

17 i 

73 

196 

456 

... 1 

47 

201 

-38 

i - 56 

1 -123 

i -260 

1 

! 

48 

239 

18 

! 67 ! 

137 


i , . , 


49 

221 

-49 

- 70 



• • • 

1 ’ * ’ i 

50 

270 

! 21 




1 

1 • . « ' 

51 

249 

1 

i • • • 1 

* " * 


1 , , , 

1 • • • ’ 
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Table 29.9 shows the series and the differences up to d*. For the sums of squares 
in the various columns Sj corresponding to we find — 


Si 107,541 
Si = 318,115 

Si = 1,033,613 
Si -= 3,445,308 
Si = 11,720,069 
Si = 40,548,844 


To obtain second moments we divide by 51 —j and then, to obtain the estimate of v, 


by 



We find the following - 


i 


Estimate. 


1 

2 

3 

4 

5 

6 


1075- 41 
1082-02 

1076- 58 
1047-21 
1011-05 

975-20 


Curiously enough, the estimate for j -= 2 is higher tlian that for ^ — 1 and there is 
little difference between the various estimates. In the ordinary way we should have 
concluded that the systematic component was adequately represented by a polynomial 
of order 1, that is to say a straight line, and that the residual random element had a variance 
of about 1000. 

The reader must not be surprised to find discrepancies of this kind between theory 
and experiment in short series ; and the discrepancy is not, in fact, as big as it seems. 
The variance of the original series is 6272-61. The mean square of the first difference, 
divided by 2, is 1075-41, so that about five-sixths of the variance has been eliminated by 
the first differencing, and the method indicates, quite correctly, that the greater part of 
the systematic element is linear. The random element is rather large compared with the 
non-linear systematic terms, and the latter have got caught up in it — ^the series is too short 
for the variate-difference method to disentangle them. Consider, for instance, the cubic 
term {t -- 26)^. In the original series this varies in value from — 156-25 to + 156-25. 

First differences reduce it to (t — 26)*, varying from 18-75 through zero to 18-75, 

whereas the random element is increased in range from 0 to 198. Already the systematic 
term is being swamped by the random element, and a slight degree of accidental correlation 
between the two can easily account for the increase in the mean-square of second differences. 

The matter may be put in a slightly different way. Suppose that, relying on the 
variate-difference method, we regarded the data as represented by a Unear equation plus 
a random residual. If we fitted a straight line by least squares and examined the residuals, 
we should probably find very little evidence of departure from randomness. This repre- 
sentation would differ from the mode of construction of the series, but it would be a possible 
method of construction. Only the failure of the representation to conform to further 
terms of the scries would reveal its weakness. 
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29.36. The variate-difference method thus provides a kind of lower limit to the 
degree of the polynomial which will represent a series locally or generally. There remains 
for consideration the question as to what sort of differences between successive estimates 
of V can be regarded as chance effects, in order to decide when the value has reached a 
stationary level. The sum of squares 8^ is a constant factor times the second moment, 
but as its members are correlated among themselves we cannot use the variance of the 
second moment to test its significance. Further, and , are correlated. We proceed 
to derive the sampling variance of their difference, the somewhat complicated formulae 
being due to Anderson (1914). 


• 29.37. Write 






(29.49) 

(29.50) 


(29.51) 


Then we have, as in (29.42), 

1 VlATuX^ (fcg i-b? 1- • • „ 

“) " |-6;+T. .) - 

where /^a is the variance of n. Further 

E {A’' u)* --^E[ {bo - hi Mr + «r-l -•••+(- ir “i}^ 

-I- +*« «r - • • • I ( ” 

+ {h«Mn — T-^^ ^n-2 ' • • • I' (~ I)*" h, «„-r }*]*• 

Consider first of all the terms in this which result in fourth powers of u. They will 
derive from 

E {hg + hf ul . + b'i ui -I- hji Kxi i h'f i 1 • • • i- bf ui + . . . 

-f hji Mn + ^1 + • • • + '^h-rY • ^ 

= E {6S (ui I- ui) + (hii -1- h'f) («»-! + O + 1- ('4-2 1- '4) + • . • 

+ (bl -i- hi + . • • h^_i) (wil-r+l + «r) + (M I- hj -f- . . . t- bl) 

(ui-r + «Lr-l + • • • i- '4+1 )}* 

Writing now . l« \ 

Bl = (hii)* -I- (h;i hf)* -f (hi' t- h^ -i- . . . -f- hr_i) 

+‘»*“(Ty' ■ • • 

we see that the term in E (u*) is 

(n - 2r) -1- 2Bf> ] E (u*). 

The only other term appearing from (29.51) will be of type E (ui «4). m. If the reader 
wiU write out the expansion of (29.51) he will find that the coefficients are «vnr«««,W« m 

terms of / \2 

A) = (ho h, -H hi h;+i + . . . + b,.} h,)* = ( ^ j • • (29-56) 


(29.52) 

(29.53) 

(29.54) 

(29.65) 

ler 
in 


and 

^=(h,h^)*H-(h.h^+hihy+i)‘-‘+ • • • 1- {h.h^+h, h;+i + . . • + hr_^_i h,_i)*. . (29.67) 
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The expression for E u)* reduces to — 

(n — 2r) E (u*) -{■ A {{n — + \) A\ (n — 2r + 2) A\ . . . 

+ Al (n -2r + r)}E («? <) + 2Rg E (»‘) 

-f 8 {R? -f Ri + . . . -f R?_i -f Bl)E(u^, (29.68) 

( 2/* \ ^ 

^ J and subtracting 

we find the sampling variance of the estimate of v. The expression can, however, be 
simplified to some extent. Putting 

r-l y V « / VO r -2 ^ V ^ VO r-li / ^\2 / ^ \2 

+ 


‘■O'O' ‘ 


(29.59) 


we find, after lengthy algebraic rearrangement, 
var 


n 


(n - r) 


cy 


Har) 

^Rf)- 


n — r 


2 {n — r) 


r <. 


j 


(29.60) 


If terms of order (n — r) ‘ can be neglected, this reduces to 

/u - 3jt,l (zr) 2 ^ 
n — r n — r 

or, using the Stirling approximation to factorials, 

l {fi^ - 2fil + i4 V(2m) }, . . . 

n -- r 

which is a fair approximation to (29.61), being within 3 per cent, for r as low as 6. 

When the population of values of u is normal, *— 3/^2 vanishes and the formula 
simplifies accordingly. 

29.38. In a similar way it may be shown that 

r L. 1 


(29.61) 


(29.62) 


oov- 


(n - r) 


(") 


(n 


1 ) 


/2r + 2\ 

W + 1 /. 


^4 -- 3^1 I 1 

n — r 1 


2t; 


+ 


(*%: ‘1 


2» — 2r — 1_ r-fl 

“H r ZTi 2 (^-T^ 1) 


. (29.63) 
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where 


r 1 


From (29.60) and (29.63) we can determine the variance of the difference of 

The general formula is complicated, but for normal variation, large n and r > 6 we have, 
analogously to (29.62), 

r s, s. 


var < 


(n 




{n — r 



(:ir { 1)V(^^0 

2 (2r 1- 1)** {n - -r 1) 


(n 



(29.64) 


The arithmetic application of the formulae has been facilitated by the preparation of tables 
of the constants involved. Reference may be made to Tintner (1940) who gives tables 
prepared by himself, Anderson and Zaycoff. 


Example 29.7 

For the data of Table 29.3 (sheep population) an application of the variate-difference 
method up to the tenth difference gave the following results : - 


r 

1 

2 


.3 

4 


() 

7 

S 

9 

10 


3468 

1442 

854 

629 

518 

448 

401 

371 

357 

347 


The values here are falling steadily from r 1 to r -- 10, but very slightly towards 
the end. From (29.64) for r ^ 6 we have for the variamie of the difference, 80-7 approxi- 
mately and for r 10, 25*8 approximately. It appears that the reduction in variance 
at r = 10 is losing significance, and that a moving arc of degree 10 would be sufficient to 
eliminate the systematic component. It does not, of course, follow that the trend-line 
must be of this degree, for we may not want to eliminate the oscillatory movements in 
the trend-line. 


29.39. The variate-difference method will clearly not eliminate systematic effects 
such as periodic terms with very short period. Consider, for instance, the series 1,-1, 
etc. The first differences give us a series 2, — 2, 2, — 2, etc., second differences 



394 


TIME-SERIES 


4, — 4, 4, — 4, etc., and so on. The variance of the series of rth differences is, neglecting 
effects due to the shortness of the series, 2‘^*‘ times that of the original, and the quotient 


when this is divided by ^ ^ tends to 

22r (y !)2 

(2r!) 


V nr 


and so increases without limit. Tn such a case we cannot obtain an estimate of the variance 
of any random element which may be present. 


NOTES AND REFERENCES 

References to the fitting of polynomials are given at the end of Chapter 22. For the 
moving average see Whittaker and Robinson’s Calculus of Observations and the books by 
Macaulay (1931) and Sasuly (1934). 

Attempts have been made to use trend-lines for purposes of forecasting, and even to 
measure the standard error of a forecast — see Schultz (1930) and a discussion in Davis 
(1941). The methods proposed appear to me theoretically unsound and in practice they 
lead as a rule to such wide limits of error as to be of doubtful value ; but this is a personal 
opinion and the less sceptical readcT may care to consult Davis’s book and to follow up 
the references given therein. 

For the effect of moving averages on random variables see Yule (1921) and Slutzky 
(19376), the latter being an English version of a paper published in Russian many years 
earlier. Sec also Dodd (1939a, 1941a). Slutzky proves an interesting theorem — the 
theorem of the sinusoidal limit— to the effect that repeated moving averages of certain 
kinds applied to random series generate a sine-curve. 

For the variate-difference method see the book by Tintner (1940), a very thorough 
practical account with useful tables. The more important earlier memoirs are those by 
Anderson (1914, 1923, 1926), “Student” (1914), Morant (1921), and K. Pearson and 
Cave (1914). 


EXERCISES 

29 . 1 . Show that in the formulae of equation (29.7) and similar formulae of higher 
orders the sum of the weights is unity, 

29 . 2 . By evaluating the solutions of (29.5) determinantally show that a parabolic 
curve of second or third order giving a graduation 

aiU__^ -f ^(^-1) • • • + ^0 + • • • 

has 

« = 3 

^ (in - 1) (2n + 1) (2» + 3)‘ 

29 . 3 . Show that the weights in the Spencer 21-point formula are 

— [- 1, - 3, - 6, - 5, - 2, 6, 18, 33, 47, 67, 60 , ... ] 

oOU 

and that if it is applied to a random series the variance of the resultant is about one-seventh 
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of the original series about the same reduction as would be given by a simple moving 
average of sevens. 


29.4. Show that Macaulay’s 43-point formula, 

[12] [8] [.5P ^1, - I, 0, 0, 0. 0, 0, 0, 1, ... J, 

has weights 

%60 ~ - 122, - 178, - 205, - 190, - 127, 

— 6, 163, 360, 562, 760, 928, 1050, 1127, 1156, . . .] 

and that it reduces the variance of a random series about as much as a simple average 
of nines. 


29.5. Take a random series of, say, 200 terms and determine “ trends ” by moving 
averages - [9], [9]*-* and - [9]-'*. Compare the mean distances between peaks and 

uperosses with the theoretical values based on normal theory. 


29.6. If Cf is a random series, show that the correlation between successive members 

k 

of Ef for long series is — ^ and hence tends to - 1 as increases. Hence show 

that the signs of successive terms in /I* tend to alternate, where is the sum of a random 
element and a systematic element representable by a polynomial ; and verify by reference 
to Table 29.9. 


29.7. By eliminating from (29.19) show that, for a cubic curve, an accurate trend- 
line is given by 



and generalise this result. 

(Cf. J. A. Higham, J, Inst, Act. (1882-6), 23, 335; 25, 15, 246.) 



CHAPTER 30 

TIME-SERIES— (2) 


30.1. The present chapter is devoted to a discussion of oscillatory effects in time- 
series. We shall suppose that our series is stationary, i.e. has no trend, either because the 
original data contained none or because trend has been removed by one of the methods 
described in the last chapter. Our typical series will then fluctuate round some constant 
value which we may usually, without loss of generality, take to be zero. We shall assume 
that there is a prior possibility that part of the variation at least is random. This, indeed. 


TABLE 30.1 


Trend-free Whexd-Price Index (Europeun Prices) compiled by Sir William Beveridge for 

the Yenrs 1500-1869. 


(From Beveridge, 1921.) 



X 

i 



X 

■ .. 1 

X 

1 

X 

! 


. i 

X 

'V 

X 

1 

X 

. 

X 



Yeai 

£ 

1 

Yeu 

1 

Year 

1 

Year 


1 1 

1 

Year 

s 

1 


Year 

A 

Year 

hi 

1500 

106 

1637 

73 

1574 

113 

1611 

100 

1648 

122 

10851 

74 

! 

1722 

91 

1769 

91 

1796 

95 

1 

1833 

80 

01 

118 

38 

86 

75 

89 

12 

99 

49 

134 

86 j 

76 

23 

94 

60 

88 

97; 

84 

34 

78 


124 

39 

74 

76 

87 

13 

100 

60 

119 

87! 

66 

24 

no 

61 

100 

98 

87 

35 

82 

03 

94 

40 

74 

77 

87 

14 

94 

61 

136 

881 

62 

25 

111 

62 

97 

991 

120 

36 

88 

04 

82 

41 

76 

78 

79 

16 

88 

52 

102 

89 

76 

26 

103 

63 

88 

1800' 

139 

37 

102 

05 

88 

42 

80 

79 

90 

16 

92 

53 

72 

90 

79 

27 

94 

64 

95 

01 

117 

38 

117 

06 

87 

43 

96 

80 

90 

17 

100 

54 

63 

91 

97 

28! 

101 

65 

101 

02: 

105 

39 

107 

07 

88 

44 

112 

81 

87 

18 

82 

55 

76 

92 

134 

29! 

90 

66 

106 

03' 

94 

40 

95 

08 

88 

45 

144 

82 

83 

19 

73 

56 

76 

93 

169 

30: 

96 

67 

113 

04 

125 

41 

101 

09 

68 

46 

80 

83 

86 

20 

81 

67 

77 

94 

ni 

31 

80 

68 

108 

05 

114 

42 

92 

10 

98 

47 

54 

84 

76 

21 

99 

58 

103 

95 1 

109 

32 

76 

69 

108 

06 

98 

43 

88 

11 

115 

48 

69 

85 

no 

22 

124 

59 

104 

96 1 

in 

33! 

84 

70 

131 

07' 

93 

44 

92 

12 

135 

49 

100 

86 

161 

23 

106 

60 

120 

97 1 

128 

34 

91 

71 

136 

08 • 

94 

45 

115 

13 

104 

50 

103 

87 

97 

24 

106 

61 

167 

98 

163 

35 

94 

72 

119 

09. 

94 

46 

139 

14 

96 

51 

129 

88 

84 

25 

121 

62 

126 

99 

137 

36 

101 

73 

106 

10 

104 

47 

90 

15 

no 

52 

100 

89 

106 

26 

105 

63 

108 

1700 

99 

37 

93 

74 

105 

11 

140 

48 

80 

16 

107 

53 

90 

90 

111 

27 

84 

64 

91 

01 

85 

38 

91 

75 

88 

12 1 

121 

49 

74 

i 

97 

54 

100 

91 

97 

28 

97 

65 

85 

02 

72 

39 

122 

76 

84 

13 

96 

60 

78 

18 

75 

55 

123 

92 

108 

29 

109 

66 

73 

03 

88 

40 

159 

77 

94 

14! 

96 

61 

86 

i 10 

86 

56 

156 

93 

100 

30 

148 

67 

74 

04 

77 

41 

no 

78 

87 

15:130 

52 

105 

j 20 

111 

57 

71 

94 

119 

31 

114 

68 

80 

06 

66 

42 

90 

79 

79 

16' 

178 

53 

138 

1 21 

125 

58 

71 

95 

131 

32 

108 

69 

74 

06 

64 

43 

81 

80 

87 

17 

126 

54 

141 

i 22 

78 

59 

81 

96 

143 

33 

97 

70 

78 

07 

1 69 

44 

84 

81 

88 

18' 

94 

55 

138 

i 23 

86 

60 

84 

97 

138 

34 

92 

71 1 

83 

08 

125 

45 

102 

82 

94 

19 

86 

56 

107 

1 24 

102 

61 

97 

98 

112 

36 

97 

72 

84 

09 

175 1 

46 

102 

83 

94 

20 

84 

57 

82 

25 

71 

62 

105 

90 

99 

36 

98 

73 

106 

10 

108 

47 

100 

84 

92 

21 

76 

58 

81 

26 

81 

63 

90 

1600 

97 

37 

105 

74 

134 

11 

103 

48 

109 

85 

85 

22 

77 

59 

97 

27 

120 

64 

78 

01 

80 

38 

97 

76 

122 

12 

115 

49 

104 

86 

84 

23 

71 

60 

116 

28 

130 

65 

112 

02 

90 

39 

93 

76 

102 

13 

134 

60 

90 

87 

93 

24 

71 

61 

107 

29 

129 

66 

100 

03 

90 

40 

99 

77 

107 

14 

108 

51 

99 

88 

108 

25 

69 

62 

92 

30 

125 

67 

86 

04 

80 

41 

99 

78 

1151 

15 

90 

52 

95 

89 

108 

26 

82 

63 

79 

31 

130 

68 

77 

05 

77 

42 

107 

79 

113! 

16 

89 

53 

90 

90 

86 

27 

93 

64 

81 

32 

97 

69 

80 

06 

81 

43 

106 

80 

104 

17 

89 

54 

80 

91 

78 

28 

114 

65 

94 

1 33 

00 

70 

93 

07 

98 

44 

96 

81 

92 

18 

94 

55 

85 

92 

87 

29 

103 

66 

119 

34 

76 

71 

112 

08 

115 

45 

82 

82 

84 

19 

107 

56 

117 

93 

85 

30 

no 

. 67 

118 

1 35 

102 

72 

131 

09 

94 

46 

88 

83 

86 

20 

89 

57 

112 

94 

103 

31 

105 

68 

93 

1 36 

1_ 

100 

73 

158 

10 

93 

47 

116 

84 

101 

21 

79 

58 

95 

95 

130 

32 

82 

69 

102 

, 


396 



OSCILLATION AND CYCLE 


397 


is necessary if our results are to have any practical application, for most of the series 
encountered in practice have some element of irregularity, however small. 

30.2. Four examples of the type of series under consideration have already occurred. 
The table of Example 21.11 (page 126) gives the deviations from a simple nine-year moving 
average of the yields of potatoes in tenths of tons per acre in England and Wales for the 
years 1888-1935. Table 29.1 (Fig. 29.1) gives the annual yields of barley in cwts. per 
acre in England and Wales for 1884-1939, no nine-year elimination of trend having been 
carried out in this case. Table 29.4 (Fig. 29.4) gives rainfall data at London over the 
century 1813-1912. Table 29.5 (Fig. 29.5) gives egg-production per laying hen in the 
U.S.A. 


TABLE 30.2 

Marriage. Rate in EnqUimi and Wales: Deviation frerm a Simple 11-Year Moving Average 

for the Years 1843-1896. 

Units I in 10,000. 


Year. 
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Tables 30.1 and 30.2 give two further examples. The first is a famous series of trend- 
free wheat-price indices compiled by Sir William Beveridge and extending over 370 years, 
a phenomenal length of time for economic series. The second is the deviation from a 
simple 11-year moving average of marriage rates for the years 1843-1896.^ 

Oscillation and Cycle 

30.3. We will now attempt to define more closely the sense in which we use the 
words “ oscillation ” and “ cycle It is particularly important to exercise great care in 
the use of an accurate nomenclature because a great deal of the literature on this subject 
suffers from confusion due to loose wording. 
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By a cyclical component of a time-series we shall mean one which is a strictly periodic 
function of tl^time, that is to say, for which there exists a period o> such that 

V/ ) «/ = ««+«, = %+2« = . . . = Ut+^ = (30.1) 

whatever the value of U The periodic functions which we shall consider in particular are 
the sine and cosine functions. If the series can be represented as the sum of a cyclical 
component and a random constituent, or by a cyclical component alone, we may speak 
of it as a cyclical series. 


30.4. If the series is not random it must move with more or less regularity about 
the mean value, and we shall then speak of it as oaciUaiory, The oscillatory movement 
may be in part due to random elements but must not be entirely so. A cyclical series is 
oscillatory, but an oscillatory series is not necessarily cyclical. 

An oscillatory movement may be the sum of two or more cyclical components. Con- 
sider, for instance, the sum of two periodic terms 


~ sin 


27lt 

<Oi 


+ sin 


27lt 

(O2 


If ft)i and (W 2 are commensurable there will be numbers, and in particular a smallest number j 
CO, which is an exact multiple of botli of them. This is clearly a period of the series, j 
But if coi and CO 2 are not commensurable there will be no period of this kind and the sum \ 
will be oscillatory but not cyclical. 


30.5. ft may be felt by the reader that we could reasonably extend the use of the 
word “ cyclical ” to cover series which are the sum of cyclical terms ; but the danger of 
doing so is that within certain limits any series can be represented as a sum of harmonic 
terms, even if it is not itself oscillatory, in virtue of Fourier’s theorem. Admittedly such 
a representation, to be exact, must in general consist of an infinite series of terms and is 
valid only in a certain range, but in practice ^a comparatively small number of terms often 
gives quite a good approximation. We do not call a function a polynomial because it 
can be expanded in powers of the variable by Taylor’s theorem ; and correspondingly 
we shall not call it cyclical because it can be expanded as a sum of harmonic terms by 
Fourier’s theorem. On the whole it seems safer to avoid the word “ cyclical ” for series 
which consist of 'a finite number of cyclical terms. 

30.6. For our present purposes the main significance of the distinction we are attempt- 
ing to make w that in a .oyc licjail series the maxima and mjiaiiiift* from disturb ances 

due to the ^sup erposition ^pf a random^e m(^^^ at equal i nteiyals^ of rime and are 

thejefore pr^Qtable for a long way l^^K£^^^j^- ^^^^or so Ion as the coiuy^i^ urion 

of the system remains unchgnj^.^^"1ii^Qg^ ^^ smes, oh the o&er hand^ t l^^ 

from" p^lQo gea^ troujgl^ to u perossy are not eoi^al. but vayv very 

wnri^erabi^* ^mUarly, in tfie^scifiato^ series the amplitudes of this Btovements may 
Yyy veiy substantially, whereas in a cyclical seriea..th ev should be conse nt f^aih. except 
in so far as superposed random eleme nts disturb the m). ^ 

30.7. Now the time-Series observed in practice are very rarely oyclical as we have 
defined the term. The only case among those cited at the beginning of the ohaptes: in which 
there appears to be any cyclical movement is that of egg-production per hen iA Table 29.5. 
The far more usual case is that of varying amplitude and period from peak to peak pr upoross 


TESTS FOR RANDOMNESS 


399 


to upcross. We shall therefore begin our study of oscillatory movements by considering 
the kinds of scheme which can give rise to the observed phenomena ; and then we shall 
examine methods of deciding which of the possible schemes should be chosen as the 
hypothetical representation in particular cases. 

Tes^ for Baridomness 

30.8. The first stage, when confronted with a fluctuating stationary series, is to 

examine whether the fluctuations are purely random. Tests of randomness are easy to 
find, and in fact the random series is the happy hunting-ground of the worker whose interests 
lie mainly in the mathematics of the direct theory of probability. We have considered 
some tests which are appropriate to the study of oscillatory movement in 21.43 to 21.46.|^./ 
Others which have gained popularity are based on the distribution of ‘‘ runs and on the 
correlation between successive members of the series. The reader will have no difficulty 
in composing others. All these tests arc based on the non-parametric case, so that the 
alternative h 3 rpotheses are not usually brought specifically into view. We cannot there- 
fore apply the general theory of Chapters 26 and 27 to determine “ best ” tests, and in the 
present state of knowledge are forced to be content with less definite ideas . So far as 
ease of appli cation goes, the tests of 21.43 and 21.44 seem to have decided advantages, 
though they may be somewhat insensitive. hod of serial correlatio n^ to which we 

r efer b elow^ give s a useful alternative in doubtful casp^. In the sequel we shall suppose 
that before proceeding to search For systematic movements we have satisfied ourselves by 
one or more of these tests that such movements exist. 

30.9. We shall consider three schemes which can account for the typical oscillatory 
movement usually observed. 

(а) Moving Averages, — ^We have already seen in Chapter 29 that a moving average 
of a purely random element can generate an oscillatory series with all the required properties 
of varying amplitude and mean distances — the Slutzky-Yule effect (29.25). Fig. 29.6- 
illustrates the kind of oscillation which may arise. It is at least possible that some of the 
observed oscillations in time-series may be generated in this way ; and in fact Slutzky 
(1936) has given an interesting example in which a part of his series generated by the 
moving average happens to agree very closely with an observed series. 

(б) Sums of Cyclical Components. — ^We may attempt, by Fourier analysis or the more 
general harmonic analysis, to represent the oscillations as the sum of a number of cyclical 
components. This is the classical approach. 

(c) AjUategression Equations. — If a series is constructed by the recurrence formula 

^(+1 =f(ut, • • • '^i-k) + • • • (30.2) 

where/ is a mathematical function and c a “ disturbance ” function which may be a random 
variable, then under certain conditions the generated series is of the required type. We 
sAell. consider in particular the series 

^ .... (30.3) 

where u and b are constants and e is random. 

Table 30.3 (Fig. 30.1) shows a series of type (6) in the simplest case where only one 
oyelical component is involved, together with a random residual. Tabl e 30.4 ( Fig. 30.2) 
shows an autoregressive series constructed from random numbers by the formula 

U^^2 ~ ®/4'2* .... (30.4) 
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TABLE 30.3 


Values of the Series u, 


7tt 

= 10 sin — - + where is a Rectangular Random 
5 


Range — 5 to +5, rounded off to Nearest Unit. 


Variable with 


Number of | 
Term. | 

1 

Series. 

Number 

Term. 

1 ! 

3 

21 

2 ! 

8 

22 

3 

6 

23 

4 i 

2 

24 

i 

1 - 4 

25 

6 

- 7 

26 

7 

- 

27 

8 

- 91 

28 

9 

- 10 

29 

10 

- 1 

30 

11 

8 

31 

12 

i 7 

32 

13 

I 6 

33 

14 

4 

34 

15 

~ 3 

35 

16 

- 10 

36 

17 

i - 11 

37 

18 

1 - 15 

38 

19 

1 - 4 

39 

20 

: 4 

40 


— 



1 

Series. 

Number of 
Term. 

Series. 

i 

11 

41 

6 i 

13 

42 

12 

10 

43 

.7 i 

6 

44 

5 

- 5 

45 

3 

- 8 

46 

- 2 

- 12 

47 

- 121 

- 10 

48 

- 121 

- 7 

49 

- 8 

0 

50 

- 1 ! 

1 

51 

11 i 

8 

52 

13 j 

13 

53 

12 

7 

54 

7 1 

4 

55 

5 ! 

- 

56 

- 1 1 

-- 91 

57 

- 6 

- 6 

58 

- 14 ! 

- 4 

59 

~ 8 I 

- 2 

60 

1 i 




Values of Series. 
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TABLE 30.4 

VcHuea of Series = 1-1 — 0-5 tt, + is a Bectangular Random 

Variable with Range — 9’5 to 9'5, rounded off to Nearest Unit. 


Number 

Value of 

Number 

1 Value of 

Number 

Value of 1 

of Term. 

Series. 

of Term. 

i Scries. 

i 

of Term. 

Series. ^ 

1 

7 

23 

! 

i 

- 4 

45 

- 13 

2 

6 

24 

- 5 

46 

1 

3 

-- 6 

25 

- 9 

47 

6 

4 

- 4 

26 

- 4 

48 

4 

5 

3 

27 

- 4 

49 


0 
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28 

3 

50 

15 ! 

7 
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29 
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51 

9 

8 

- 1 

30 

4 

52 


9 

10 

31 
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53 

4 

10 

' 10 

32 

- 6 

54 

! - 1 

11 

6 

33 

- 3 

55 

! 4 

12 

- 4 

34 

- 2 

56 

1 7 

13 

~ 4 

35 

0 

57 


14 

1 - 7 

36 

- 1 

58 

0 ! 

15 

- 2 

37 

3 

59 

1 

16 

6 

38 

3 

60 

i 0 i 

17 

17 

39 

- 1 

61 

: 5 j 

18 

; 24 

40 

-- 8 

62 

- 11 1 

19 

17 

41 

- 3 

63 

: - ^ i 

20 

4 

42 

- 8 

64 

1 — 3 j 

21 

1 

43 

-- 10 

65 

! 5 I 

22 

I - 5 

44 

- 16 


i ! 



Fig. 30.2. — Graph of the Values of Table 30.4. 
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30*10. It is quite possible that theoretical reasons may suggest other schemes for 
study as the subject progresses. For instance, we might wish to consider series defined 
by differential equations, on the analogy of the similar equations determining oscillations 
in physical phenomena such as vibrating strings or electrical discharges. Something has, 
in fact, already been done in this direction. We shall, .however, confine our attention 
to the three schemes indicated above, and particularly the second and third. 

30.11. On the face of it, an observed series exhibiting the typical movements in 
amplitude and period might be due to any one of the three schemes or even to a combination 
of them. We require, in the first instance, some objective criterion for deciding which of 
them is applicable in particular cases. Inspection of the primary data, though useful, is 
quite an unreliable guide in making a decision on this point, particularly if the series 
is short. Experience seems to indicate that few things are more likely to mislead in the 
theory of oscillatory series than attempts to determine the nature of the oscillatory move- 
ment by mere contemplation of the series itself ; and yet this is the method, if one can 
^dignify it by such a term, which has perhaps been most widely used in the past. 

'erial Correlation 

30.12. Suppose our series of values is 1^1 . . . Let us form the product-moment 
correlation coefficient between successive terms, i.e. 

^ (var var 

There will be (n -- 1) pairs entering into the correlation, and the variances of Uj and 
differ only in the fact that the first relates to the terms the second 

to the terms The coefficient rx is called the aerial correlation coefficient 

of Mlhe first order, or more briefly the first serial correlation.* 

^ More generally, let us define a coefficient of order k : 

f r y. 

* (var var 


(30.5) 


(30.6) 


n 


1 V, V 1 \ 


r-k = »•*/ 


(f-)T 


(30.7) 


(30.8) 


By convention we define 

^ 30.13. In practice we often require to calculate serial correlations up to and for 

long series as many as 60. The arithmetic is tedious but may be systematised so as to 
reduce labour, which arises chiefly in the determination of cross-products forming the 
covariances. 

The series of n terms is written down vertically on each of two slips of paper« the spacing 
- being equal on the two slips. This can very conveniently be done on a.Burrouglui tabulator 
with a split keyboard, the series being recorded in duplicate and the resulting strip cut up 

k It is sometimes convenient to confine this expression to values calculate from wmples, the 
n corresponding values for the infinite series being termed “ autocorrelations ” and denoted by aGreekp* 
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the middle. To calculate the first product-sum we pin the slips so that the first term 
on the right-hand slip is opposite the second term on the left-hand slip, and hence so that 
the jth term on the right is opposite to the (j + l)th on the left all the way down. For 
most series the difierences of two terms which are opposite can be obtained mentally by 
subtraction, squared, and set up on an adding-machine. The sum of squares of differences 
is thus determined, and the cross-product found from the simple identity 
2 2:(XY) - rex - 7)2. 

We then move the right-hand slip down one space so that the jth term is opposite the 
( j + 2)th term on the left and repeat the process ; and so on to as many terms as may 
be required.. 

In this process (X 2) and Z (Y^) are required at each stage, and it is as well to deter- 
mine them by cumulative summation from the two ends of the series. Z (X) and Z (Y) 
are also required. It is also convenient on occasion to reduce the series to zero mean 
approximately before beginning the analysis. 

Example 30 J 

To illustrate the arithmetic we will take a very trivial example which the reader should 
check for himself. Take the series 

-5, -6, 2, 4, 7, 3, 1, -5, ^ 1, 2. 

We set up the following scheme of tabulation for calculating serial correlations up to the 
fifth order : — 


1 

n — k. \ 

fc. ! 

S(X) 

(from boKinning 

S(Y) 
(from end 

(from 

27 (F*) 

I (from end). 

i s (X - y)».' 

Z(XY). 

1 

1 


of series). 

of series). 

1 b(^ginning). 

' , 


■ i 

10 1 

1 

i 0 i 

1 

- 2 I 

- 2 

170 

170 

i 0 1 

170 

9 

1 1 

- 4 

3 

166 c x 

145 

1 143 

84 

8 ! 

! 2 

- 3 

9 

16.5 * 

109 

344 

- 35 

7 

3 

2 I 

li 

140 

105 

445 

- 100 

6 1 

4 

1 

7 

139 

89 

; 380 

- 76 

5 ! 

5 

- 2 

0 

130 

40 

i 172 

1 1 

-- 1 


The number n ■— k the number of pairs entering into the fcth correlation. Z (X) is the 
sum of 71 ~ * terms beginning at the first term, Z ( Y) the corresponding sum of the last 
n — k terms, and similarly for Z (X^) and 27 ( 72 ). These are the quantities required to 
calculate the variances entering into the denominator of the Ath serial correlation. The 
quantities 27 (X — 7)2 are calculated by the moving-slip method described above. 

We now calculate the correlation coefficients in the usual way, e.g. for Vi 

8.247 

var r = = 16-000 

9-4816 , 

* V'(18-247 X 16) ^ 



4^4 

and for ri 


TIME-SERIES 


cov 


var X = ^ - ly = 26-840 

var r = ^ - f 5y = 8-000 


0*200 


r* = - 0*01. 

When n is large and the origin is chosen so that the mean of the whole series is approxi- 

E(XY) 


mately zero, a sufficiently good value of r is given by 


the corrections 


{2;(Z2) 2;(y2)}*’ 

required to adjust the sums of squares and products to values about the mean being small ; 
but this approximation must be used with some care and in any case the first two or three 
serial coefficients should be worked out exactly. 




Then 


'The Gorrelogram 

30.14. The diagram obtained by graphing as ordinate against k as abscissa and 
joining the points each to the next is called a correlogram. We shall give a number of 
examples below and shall see that the form of the correlogram provides a method of dis- 
criminating between the various types of oscillatory series. _ 

30.15. Suppose, for example, that the series is generated by a moving average of 
random elements with weights ai, aj* • • • The typical term of the series is then 

Uj = ai ej a2 Sj^i + . . . • • • (30.9) 

Without loss of generality we may take E (e) ~ 0 and hence E (u^) = 0. 

E {Uj Uj+ic) ~ ^ + • • • + } 

{»! + a2 4- . • • I- ^j+k 

E {ej Ej^/g) =0, k ^ 0 

= V, say, if = 0 

we have 

+ • • • . . (30.10) 

provided that m> k. But if k > m then 

E (uj u^^fg) = 0, ..... (30.11) 

Thus for an infinite series generated by the moving average the serial correlations vanish 
for k > m, and the correlogram from that point onwards coincides with the a;-axis. In 
particular, if the a’s are all equal to 1/m, we have C 

E {uj Uj^ig) = (m — *) 


j+k+m-l} • 

: V 'I A. / 

J^k 


m* 




and hence 


L 




(30.12) 


so that the correlogram conusts of a straight line joining the point (O, 1) to (£, 0), ti^ther 
with the «-axis from the latter point onwards. 
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Example 30.2 

The weights of the Spencer 21-point formula are 
1) — — 5, — 5, 


— {- 
360 ^ 


2, 6, 18, 33, 47, 57, 60, . . 


Apart from the divisor 350, which may be disregarded for present purposes, the sum of 
squares of weights is 17,542. The products (30.10) and the corresponding serial correlations 
are as follows : — 


h. 

! 

S Uj Gy+it. 

! 

Tk. 

k. 

S Hi G; f 

r*. 

0 

: 

17, .542 

1000 

11 

- 930 

- 0-0, ^3 

1 

16,780 i 

0-957 

12 

- 528 

- 0-030 

2 

14,667 i 

0*8.36 

13 

- 214 

~ 0-012 

3 

11,584 i 

0-660 

14 

- 27 

- 0-002 

4 

i 8,085 

0-461 

15 

! 50 

1 0-003 

5 

1 4,726 ' 

0-2()9 

16 

1 59 

0-003 

0 

! 1,951 

0-111 

17 

’ 40 

0-002 

7 

6 1 

0-000 

18 

i 

0-001 

8 

1 - 1,074 

- 0-061 

19 

6 

0-000 

9 

1 - 1,430 i 

- 0-082 

20 

1 

0-000 

10 

1 ~ 1,298 

-- 0-071 

21 

0 

0-000 



Fig. 30.3.— Correlogram of Series generatoci by the Spencer 21 -point Formula (Example 30.2). 


The correlogram is shown in Fig. 30.3. From * = 13 onwards the correlations are very 
small, and from Jfc = 21 onwards they vanish completely. 
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J30.16. Suppose now that the series consists of a sine term A sin Ot plus Sf, a random 
residual. As before, we may suppose E («,) = 0, and hence 


E {Uj =E {A sin Oj -j- e^} {A sin 0 (j *) -t- «,+*} 

= A^ E {sin Oj sin 0 (j -t- i) } 

A 2 

= — {sin Oj Bind (j + k) } . . . . (30.13) 

n 


=z^- U {cos Ok — cos 0 (2j + A;) } 



A^ cos 0 (fc -f n + 1) sin nO 

2 2n sin 0 

. (30.14) 

Thus for large n we 

have effectively, unless 0 is small, 



A^ 



E {uj Uj^jc) = - cos Ok ~ B cos Ok, say. 

. (30.16) 

Similarly we find 

E = J5 + var e = C, say. 

. (30.16) 

Hence 



^ cos Ok/ k> 0, 

G 

. (30.17) 


In short, for an infinite cyclical series the correlogram itself is a harmonic with period 
equal ^ to that of the original harmonic component. 

/ 

^-30. 17. When the original series is the sum of several harmonic terms the formula 
for Tfg will, in general, be the sum of harmonics, not necessarily with the same periods. 
Thus the correlogram will present a sinusoidal fofnTwhich will not degenerate to the a;-axis 
after some fixed point and will not, in fact, be damped. 


r 

30,i,8. Consider now the series defined by (30.3), namely 




This is a difference equation which is easily solved by the usual methods.* 
solution of 


is 

where 


^<+2 + + bu ^ = 0 . 

I = (A cos Ot + B sin Ot) 




COS 0 == — 


2^yb 


The general 
. (30.18) 
. (30.19) 

. (30.20) 


Here y/b is to be taken with positive sign, and it is assumed that 46 > We also assume 
that y/b is not greater than unity. The contrary case is mathematically permissible, but 
it implies that increases without limit, which is outside the domain of our consideratioi^. 


* See, for instance, Milne-Thomson. Cdhulua of Finite Differences^ chapter 13. 
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Consider now the series 

( 30 - 21 ) 

/ 

where is a particular solution of (30.19) such that ^o '“ ^ and = I, i.e. such that 

On substituting (30.21) in the original equation it will be found to provide a particular 
solution. The general solution is then 

{A cos 6t B sill Ot) + ^ • • • (30.23) 

;-u 

As p is not greater than unity we shall, in general, find that the first term in this expression 
is damped out of existence. If we may rc^gard our scries as having been “ started up ” 
some time prior to the point f ~ 0 , the solution is effectively 

( 30 . 24 ) 

j=0 


30.19. In this form the autoregressive scheme is seen to be a moving average of 
a component e with infinite extent and clamped harmonic weights. Consider now its 
correlogram. We have 


Now 


Thus 


2^ ^j \ k ^ ^ 0{j + k)} / 

Z [‘P^^{ cos Ok — cos 0 (2j + fc) } ] 

46 — 

_ 2p^ f cos Ok _ cos Ok - - cos 0 {k -* 2 ) | 25 ^ 

46 \ 1 — p'^ 1 — 2p^^ cos 20 + J 

E {Uj Uj+k) E {E e/_j+i) E (^j et ^.jc-j+i) } 
i J 

= var e z {(j?i+k) (30.26) 

j~o 


QO 

var € 

i-o 

vare^^ 

frj) 


n = 
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which, on 

substitution from (30.25), reduces to 



'* " (, + 8in »*"" «•• + » 8 - (t - 1 ) «). 

. (30.27) 

Writing 

1 '4- 

tan y) = — ' - ^ tan 0, .... . 

• (3j).28) 

we find 

p^sin(kO + y,) 

Sin y) y 

. (30.2^ 


From this we see that the correlogram will oscillate with period but that, owing to 

the factor it will he. dafnped. If k is negative the formula applies, except t^atT | k | 
must be used instead of k on the right-hand side of (30.29). 

30.20. We thus reach the interesting conclusion that the three tyi)cs of series con- 
sidered in 30.9, however similar to the eye, will have distinct types of correlogram, pro- 
vided that the series are long enough for the observed correlations to approach the expected 
values for an infinite series. The correlogram of a series generated by moving averages, ^ 
though it may oscillate as in Example 30.2, will vanish after a certain point ; that of a 
series of harmonic terms will oscillate, but will not vanish or be damped ; that of the auto- 
regressive scheme will oscillate and will not vanish, but it will be damped. The correlogram 
therefore ofiers a theoretical basis for discriminating between the three types of oscillatory 
series. 

30.21. Unfortunately the series with which we have to work are very frequently 
too short to enable a decisive distinction to be made. We shall see below that divergence 
between theory and observation can be very considerable, and that sampling theory has 
not yet advanced far enough to enable us to make objective judgments in probability 
about its significance. We shall have to rely on limited experimental evidence and to 
some extent on intuitive judgment in reaching conclusions. If, therefore, the remainder 
of this chapter contains gaps in the treatment and leaves certain points undecided the 
reader will understand that the reason is ignorance rather than indifference. 

/ Examples of Correlograms from Observed Series 

30.22. We will in the first place give the correlograms of a few of the series given 
earlier in this and the preceding chapter. 

Escample 30,3 

In Table 30.2 we gave the deviations from the trend of marriage rates for the years 
1843-1896. The first 20 serial correlations of this series are shown in Table 30.6 and the 
correlogram in Fig. 30.4. 
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TABLE 30.5 

Serial Correlations of the. Marriage Data of Table 30.2. 


Order of 


Order of ' 


Correliition 


(\)rr(.’lation 

rk- 

k. 


k. 


1 

oriB3 

11 

0-080 1 

2 

- 0 089 

12 

- 0-130 1 

3 

- 0-498 

13 

-- 0-132 ; 

4 

0-031 

14 

- 0-058 1 

5 

- 0-407 

15 

- 0-096 1 

b 

- 0-02.'> 

10 

— 0-120 1 

7 

0-3.53 

17 

- 0-036 

8 

0-39() 

18 

0-131 

9 

0-254 

19 

0-209 

10 

0104 

20 

0-205 


10 



- 1-0 \ 

Fig. 30.4. — (^Jorrelogram of Marriage Data of Table 30.2 (Table 30.6.). 


The correlogram is smooth and suggests the operation of an autoregressive scheme. 
There is little indication that a moving average, at least of extent less than 20, would account 
for the series, but on the other hand some damping appears to be present. 

Example 30.4 

Table 30.6 shows the first 60 serial correlations of the Beveridge series of T^ble 30.1, 
the correlogram being given in Fig. 30.6. i 
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TABLE 30.6 


Serial Correlations of the Beveridge Wheat-Price Iridex of Table 30,1. 


Order of 
Correlation 


k. 

rt- 

k. 


k. 

r*. 

k. 

1 

0-662 

16 

0-158 

31 

0-060 

46 

- 0-036 

2 

0103 

17 

0109 

32 

- 0-008 

47 

- 0-013 

3 

- 0 075 

18 

0 002 

33 

- 0-039 

48 

0-042 

4 

- 0 092 

19 

- 0-075 

34 

0-007 

49 

0-062 

5 

- 0 082 

20 

- 0-062 

35 

0-056 

50 

0-065 

6 

- 0136 

21 j 

- 0-021 

36 

0-010 

51 

0-050 

7 

- 0-211 

22 I 

- 0-062 

37 

- 0-004 

52 

0-009 

8 

1 - 0-261 

23 j 

- 0-088 

38 

- 0-015 

53 

- 0-027 

9 

i - 0-192 

24 1 

- 0-084 

39 

- 0-047 

54 

- 0-053 

10 

- 0-070 

25 

~ 0-076 

40 

- 0-047 

55 

- 0-073 

11 

- 0-003 

26 

- 0-091 

41 

0-008 

56 

- 0-106 

12 

- 0-015 

27 

- 0-052 

42 

0-034 

57 

- 0-084 

13 

- 0-012 

28 

- 0-032 

43 

0-065 

68 

- 0-019 

14 

0-047 

29 

- 0-012 

44 

0-099 

59 

0-003 

15 

0-101 

30 

0-059 

45 

0-009 

60 

0-010 



The correlogram here is almost certainly damped. The oscillations persist in a most 
remarkable way, notwithstanding the diminishing amplitude, and the presumption is 
a strong one that the series is of the damped type. 
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Example 30.5 

In Table 29.8 (page 386) we gave the residuals of a sheep-population series for the 
years 1871 to 1936. Table 30.7 shows the first 30 serial correlations of this series and 
Fig. 30.6 the correlogram. Again the correlogram is oscillatory, but the damping is not 
so clear. 


TABLE 30.7 

Serial Correlations of the Sheep Data of Table 29.8. 




- - 


■ 


Order of 






Correlation 

k. 

Tk- 

k. 


k. 


1 

0-595 

11 

-- 0-142 

21 

- 0-381 

2 

- 0-151 

12 

- 0-172 

22 

- 0-118 

3 

- 0-601 

13 

- 0-186 

23 

0-173 

4 

- 0-537 

14 

- 0-128 

24 

0-343 

5 

~ 0-138 

15 

0-052 

25 

0-352 

6 

0-144 

16 

0-276 

26 

0-154 

7 

0-203 

17 

0-439 

27 

- 0-203 

8 

0-118 

18 

0-293 

28 

- 0-456 

9 

0-006 

19 

- 0-074 

29 

- 0-415 

10 

- 0-078 

20 

- 0-359 

30 

- 0-184 

- 








.... 



— 1 • 0 |- 

Fio. 30.6. — Correlogram of the Sheep Population Data of Table 29.8 (Table 30.7.) 
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Significance of a Gorrelogram 

30.23. The foregoing examples illustrate one of the main difficulties we have to face 
in correlogram analysis. On intuitive grounds we seem to be justified in rejecting the 
scheme of moving averages as a possible scheme for the series of these examples, since the 
oscillations in the correlograms persist ; but we can no doubt find moving averages which 
will produce such correlograms, though their extents would have to be long (over 60 in 
the case of the Beveridge series) and their weights artificial. The only final test seems to 
be to ascertain such a moving average and then to examine whether it will predict further 
terms in the series if such can be observed. 


30.24. Distinction between the scheme of harmonic components and the auto- 
regressive scheme is even more difficult for short series, since the correlograms for the 
latter do not damp out according to expectation. Consider in fact an autoregressive 
scheme of the simple linear type (30.3X. There will be the usual variation in length from 
peak to peak and in amplitude *; but if the section of the series is a comparatively short 
one, covering, say, four or five oscillations, the oscillations will not have time to get very 
much out of step and the serial correlations will be systematically larger than one would 
expect for an infinite series. This effect is exhibited in Table 30.8 and Fig. 30.7, which 
give the serial correlations and the correlogram for the series of Table 30.4, given by the 
formula 


Here the damping factor p — 's/b — 0*7071, and by the thirtieth correlation Vj. should be 
very small, less than 0*002 in absolute magnitude. Actually it is 100 times as large. The 
mere fact that an observed correlogram for a short series fails to damp very rapidly is 
not, therefore, a very definite indication that the scries is not ruled by the autoregressive 
scheme. On the contrary, failure to damp may be expected. 

v30.25C We are on firmer ground when considering the significance of a correlogram 
in tl^^^nse of judging whether it can be derived from a random series. 

(a) The variance of in a random scries of n terms is approximately — provided 

n — ” hi 

that n is large. For 

r 1 1 * 1 


Hence, for large samples, 


; var* X. 

n — k 


var r = 


1 var^ar 
n k var* x 



. (30.30) 


R. L. Anderson (1942) has recently given exact results for the significance of a serial 
correlation. 

(fr) For our purposes, however, the important point is not whether a particular serial 
coefficient is significant, but whether the oscillatory character of the correlogram as a whole 
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TABLE 30.8 

Serial Correlations of the Artificial Series of TaMe 30.4. 


! Order of 

1 Correlation 



k. 

' k. 

j 




1 

0-70 


11 

I 2 

0-29 


12 

3 

001 


13 

‘ 4 

- 017 


14 

5 

- 0-27 


15 

i 

- 0-25 


10 

! 7 

- 013 


17 

i * 

007 


18 


012 


19 

1 10 

1 

005 


20 


Vf 

k. 

rk- 

- 0 05 

21 

005 

- 017 

22 

- 0-12 

0*27 

23 

- 0-28 

^ 0-31 

24 

- 0-43 

- 0*30 

25 

- 0-57 

- 0*18 

20 

- 0-56 

012 

27 

- 0*20 

0-29 

28 

j 002 

0-33 

29 

i 0*17 

0-22 

30 

! 0-27 

i 



is so. Here we have to form an intuitive judgment, but it can hardly be doubted that 
the undulations in Figs. 30.4 to 30.6 are not accidental. Something exists to be explained 
as a systematic effect, though what that effect is may be more difficult to decide. 

30.26. We shall proceed to study the autoregressive scheme and the scheme of 
cyclical components in more detail, without ijrejudice for the time being to the question 
as to which is the better representation in particular cases. This latter is not, in fact, 
entirely a statistical matter, and we shall return to it in 30.39. 
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The Avioregreasive Scheme 

30.27. We consider in the first instance the simplified scheme of equation (30.3). 
The theoretical correlogram for a series generated by this equation is of the damped type 
given by (30.29), 

_ p* sin {IcQ + \p) 
sin %p 

where 2n/B is the autoregressive period of the regression equation and is given by 


cos d 


a 

2^' 


The typical series of this kind has no “ period ** in the strict sense. The lengths from 
peak to peak or from upcross to upcross vary in the characteristic way. It appears from 
experiment (but has not, I think, been shown theoretically) that the distribution of dis- 
tances from peak to peak is of the unimodal type with a central value somewhere near 
the mean distance between peaks ; and similarly for troughs and upcrosses. In speaking 
of the “ period ” of an autoregressive series we mean the central value of one of these 
distributions. The question we have now to consider is whether this period is the same 
as the autoregressive period 27t/0 of the regression equation. 


30.28. We have seen in 29.26 that the mean distance between upcrosses of the 
series generated by the moving average whose weights are is given by 27 r/^, 

say, where 

w— 1 

COS A = — 

~ TO 

i) 

Substituting for f from (30.22) and using (30.26), we find 



2p f cos 6 cos 0 (1 — j)*) "I 

cos ^ ~ 1 1 ~ ^ cos 20 +p*] 

2 f' 1 ^ 1 - cos 20 T 

46— o*\l— 1 — cos 20 + 

_ 2p cos 0 

~ TTp^ 

_ * 

“ T+Tb 


(30.31) 


Thus the mean period as defined by upcrosses is 

2^/arc cos ^ ^ 

whereas that for the autoregressive period of the equation is 

2«/arccos(^). . 


(30.32) 


. (30.33) 
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30.29. The mean period between upcrosses is thus not the same as the autoregressive 
period. The two are very close for many of the values of a and h arising in practice. For 
instance, when 6 = 1 they are identical ; when a = 1, 6 = 0*5 their ratio is l*07i One 
might infer that an estimate of the period of an autoregressive scheme can be obtained 
from the correlogram, but this generalisation requires some important qualifications. 

(a) Firstly, the ratio of (30.33) to (30.32) is not necessarily close to unity for values 
of 6 in the neighbourhood of 0^/4, i.e. when 6 is small and the autoregressive period is long. 
Consider, for instance, the series generated by 


We have 


— 0 * 414 ^ + ®/+ 2 * 


cos 0 


a 

2V6 


1 -^ 

2V0-4 


= 0*9499 


However, for 0, 


0 — 18*2®, period = 19-7 units. 

cos .^ == -= 0-8571 

1-4 


(j, ^ 31®, period = 11*6 units. 

The mean distance between upcrosses, and a fortiori thai between peaks, is very much 
shorter than the autoregressive period. 

(6) The mean distance between upcrosses may miss certain oscillations above or 
below the ai-axis, so that it overestimates the period between peaks or troughs. On the 
other hand, the latter may include ripples on the main wave which wc wish to ignore. 
The reader can verify for himself, by constructing an autoregressive series by some such 
formula as the above, liow difficult it is to draw the line in particular cases. The difficulty, 
however, must be faced, for it is precisely the kind which we meet in dealing with observed 
series. 

(c) Owing to the appearance of the phase angle tp in equation (30.29) the starting- 
point of the correlogram (k -- 0) is not to be regarded as a maximum. The period of the 
correlogram is therefore to be calculated either by ignoring this point or by reference to 
distances between troughs and upcrosses in the correlogram. 


30.30. The equation 

may be regarded as expressing the regression of ^/+i f^e term being 

a residual error. We may therefore estimate the constants a and 6 from the regression 
equation of the observed series in the usual way. If we assume that the series is long enough 
for end effects to be negligible in determining the variances of the finite series, then 
var Uf ^2 = varu^ 4 .i = var and from the usual formulae for regressions we find 


a = 


n(i i:/2) 
i - rf 


. (30.34) 


6 



. (30.35) 


This gives us the constants of the autoregressive scheme from the serial correlations. 

It should, however, be realised that these estimates are rather sensitive to superposed 
error of the type we refer to below (30.32), and it is therefore unsafe to estimate the 
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autoregressive period from them. The correlogram itself appears to be a safer guide on 
this matter. 


Exampk 30.6 

Consider again the sheep data of Table 30.7 and Fig. 30.6. Suppose we have decided, 
from the appearance of the correlogram, to attempt to represent the series by an auto- 
regressive scheme. 

In the first place, we have to inquire whether a scheme of the simple linear form (30.3) 
is likely to be adequate. Would it, for example, be better to consider the more general form 

or need we take into account curvilinear regressions such as 

The first point can be elucidated by the use of partial and multiple correlations. The 
following are the partial coefficients and the function of the multiple correlation 1 — 
as determined by the continued product of (1 — r^) (cf. vol. I, equation 15.45, 
p. 380) 


Order of Partial 

Valuo of Partial 

n (1 - r»). 

Correlation. 

Correlation. 

12 

0-595 

0-6460 

13.2 

- 0-782 

0-2509 

14.23 

0-097 

0-2485 

15.234 

0-183 

0-2402 

16.2345 1 

0-031 

0-2400 

17.23456 I 

i 

0-014 

0-2400 


Evidently no appreciable gain in representation is to be obtained by taking the regression 
on more than the two preceding terms. 

The possibility as to better representation by taking curvilinear regressions may be 
oonsidered by drawing the scatter diagrams of Uf on and on These are 

shown in Fig. 30.8. It seems clear that there is an essential scatter in the data which no 
ordinary polynomial can represent, and that curvilinear terms are unlikely to add anything 
material to the linear regressions. 

We conclude that if the data are of the autoregressive type it is unnecessary to con- 
sider any more elaborate scheme than the simple type 

For this series we have 


rj == 0-595, ra == — 0-151. 



1-060 


- 0*782. 


Hence 



Values X 

ofu 

t+1 

100 ' X 


X XX 


X t 

X X 


V«31U.C VI M. X 


A.S. — ^VOL. II 


Fig. 30.8. Scatter Diagrams of ut on uf+i (top figure), and ut on w<+2 (bottom figure). 

trrfcT. TT ® 
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The autoregression equation is 

1*060 *--■ 0*782 V/^ “i* s^.^2* 

For the autoregressive period we have 


cos 0 


1*060 

2 ^( 0 - 782 ) 


0*600, 


0 = 63*2° 


360 

and hence the period is ~ = 6*8 years. 


Now in the correlogram (Fig. 30.6) there are peaks at fc 7, 17 and 25, giving a period 
of about 9 years ; and there are troughs at A; ~ 3, 13, 21 and 28, giving a mean period 
of 8*3 years. The autoregressive period as estimated from the correlogram is then between 
8 and 9 years, whereas that given by the autoregression ecpiation is 6-8 years, considerably 
shorter. 

Using the values of a and h found above, we have for the mean distance between 
upcrosses, 


cos (j) 


1-060 

1-782 


0-5948, <f> - 53-5°, 


giving a mean distance practically equal to the autoregressive period as shown by the 
regression equation. 

Finally, looking to the original series, we see that there arc nine major peaks, the 

58 

first in 1874 and the last in 1932, so that the mean distance between peaks is _ 7-25 

8 


years ; and nine upcrosses, the first between 1872 and 1873 and the last between 1930 and 

58 

1931, so that the mean distance between upcrosses is * = 7-25 years, the same as for peaks. 

8 


The upeross at 1876-7, however, is due to a temporary fall below the zero line, and had it 
not occurred we should have found a mean distance of 8-3 years. 

We have therefore reached this position : the mean period in the series itself appears 
to be about 7-25 years ; that given by the regression constants is 6-8 years ; and that given 
by the correlogram is about 8-5 years. These figures are scarcely close enough for comfort, 
and further data would be required to arrive at a more accurate estimate of the mean 
period. Nevertheless, they illustrate very well the kind of divergence which appears to 
be more the rule than the exception in dealing with short series. We should expect the 
correlogram to give a higher value than the series itself, for there may appear peaks or 
upcrosses in the latter which are purely temporary fluctuations due to the casual element. 
On the other hand, the regression constants appear to give consistently lower values for 
the autoregressive period than the correlogram, an effect found by Yule (1927a) for sunspots, 
Wold (1938a) for cost-of-living indices, and Kendall (1944a) in series of agricultural prices, 
acreage and livestock populations. 


30.31. Let us examine more closely the effect referred to at the end of the previous 
example. Our autoregressive system is based on a random element €f which is added to 
the term We can therefore regard the value at time ^ f 2 as composed of two parts, 

a systematic element expressed by giving the effect of the past history of the 

system at times t t 1 and <, together with a new random element peculiar to the moment. 
This latter is random in the sense that it is casual and unpredictable ; but once it has 
occurred it is incorporated into the motion of the system and exerts an influence on future 
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history. It is therefore quite unlike an error of observation or a sampling error which 
distorts the value of a particular incmiber but does not affect the others. 

Now suppose that sucli an error of observation is present, and let us represent it by 
7], For long series this element will increase the variance of the observed values by var 
but if it is independent of the remaining constituents of the series it will not affect the 
covariances. Hence the serial correlations will all be reduced in a constant proportion c, 
except of course ; and this, as we proceed to show, will affect the autoregressive period 
as derived from the regression constants, in general shortening the period quite considerably. 


y 


30.32. Ifr, is reduced to cr^ and to the constants of the regression equations 
are, from (J10.34) and (30.35), 

(1 - cr.,) 

1 -- r\ 


— a 


b' 


cr,, -- 


1 — c*** r\ 

The estimated autoregressive period is then O', given by 

cos 0' “ 

'lyjW 

cri (1 - cr.) 

■ 2v/(l - - rr.,)‘ 

Differentiating the logarithm of this expression and putting c — 1, we find 

r„dO' j 2r^ ^ 2ri 


(30.36) 

(30.37) 


2 tan 0' 


dc 


1 


1 * ' r 


•i- 


fi 


,d0' ({ 1-5) (362 


ra ri 

b —’a 2) 

■ a^} ' 


which reduces to 

-- tan 0' -- ^ 

dc 26 { (I + 6)2 

^ ^ ^ period P - 2.t/0. We then find 

/dP 
\dc 


(30.38) 


Now tan 0 


fdp\ ^ 

dc 4rrft{(l-! 6)2 


(:10.39) 


F"-(t (1 + b) (3fe“ 4 - h - rt*) 

a^\y/{4f) - a^y 

This equation gives us an aijproxiinate idea of tlic change in the period P for small 
changes in c near c — 1. Jfor instance, with a — — 1'5, b — 0'9 we find P — 9"7 units, 
and from (30.39), 

crir— ■ 

Thus, if c 0-9, i.e. the variance of // is about 10 per cent, of the total, the period will be 
reduced by about 1*65 years, a substantial amount. 


30.33. It is thus possible that the observed discrepancies between the autoregressive 
periods as given by the regression constants and the correlogram may be due to superposed 
random fluctuation which is not incorporated into the autoregressive scheme. This is 
not the only possible explanation ; for instance, in particular cases the disturbance function 
e may not be random. The hyi)otheses to be considered in such a case, however, are so 
complex that it is difficult to pursue a quantitative investigation without a wealth of 
material ; and this, unfortunately, is usually denied to us, at least in economic work. 
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Meteorological data are more numerous, and we may hope that further light will be thrown 
on the autoregressive scheme by a re-examination of the material available in this field. 

30.34. Consider now the more extended autoregression equation 

4 + • • • -f ~ • • (30.40) 

The explicit solution cannot be given in the simple form available when m — 2. It has, 
in general, the solution 

• ai + a| + . . . + 4- B, . - . (30.41) 

where ai . . . are the roots of 

a'" + + aa +...-} a, „= 0, . . . (30.42) 

and B is a particular integral involving the e’s. For the series to be oscillatory without 
increasing indefinitely no term such as where x is real and greater than unity, can appear. 
Assuming this to be so, and assuming further that the series was “ started up ” some time 
before < = 0, we reduce the solution to the particular integral B. 

m 

Choose a particular value of ol], such that 

fo = 0 ' 

1 

+ ai f 1 + a* lo = 0 y. . . . (30.43) 

fm-l + + • • • + fo = 0. ^ 

This is always possible in general, for it imposes m conditions on the rn constants A. Then 
it will be found on substitution that a particular integral B is given by 

QO 

Ut = Si (30.44) 

a generalisation of (30.24). Our series may then be regarded as generated by a moving 
average of infinite extent, the weights being combinations of damped harmonic and 
exponential terms. 

30.35. The corrclogram of such a series may be determined by the following method, 
due to Walker (1931). Multiply (30.40) by and sum. We find 

rk+m + «1 rk+m-i + at r*+m -2 + . . . + a» r* = . (30.46) 

Now depends only on and terms with lower subscripts and hence is uncorrelated 
with €^4.^ for fc > — m. Thus we have 

»■*+« + Oi fk+m-i + . • ■ + Om r* = 0, k> -m. . . (30.46) 

If we multiply (30.40) by we find similarly 

+ . - • + (Ifn^k+m == — • • (30.47) 

var u 

but the expression on the right no longer vanishes. In fact Uf+jc+m contains the term 
f*+i «<+«, and hence 

I t I t a 

r* + a, + . . . + o„ , 


k> —m. 


. (30.48) 
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From (30.46) it follows that the serial correlation will be given by 

(30.49) 

where the a’s are the roots of (.30.42) and the A’s are constants to be determined from initial 
conditions. Thus the correlogram will be the sum of terms which either decay exponentially 
to zero (areal) or oscillate with a similar decay to zero (a complex). Walker (1931) has 
used this result in an inquiry into a series of atmospheric pressures. 

The Autocorrelation Function 

30.36. If we have a series ii (t) defined at every point of tim(^ in some range — h 
to + h, we may define its variance as 

■>/ r (30.60) 

on the assumption that the mean value is zero, which does not limit our generality. Sup- 
pose the series is reduccMl to standard measure by dividing throughout by the square root 
of this variance. Thtm an evident generalisation of the serial correlation is given by 

/• (k) A f* u (t) u (t I- k) dt (30.61) 

2AJ -ft 

We shall call this tlu^ autocorrelation function. We van likewise regard it as defined when 
h tends to infinity, provided that the limit on the right in (30.51) exists. It is to be noted 
that r (k) is in that ease an even function of k. 


30.37. We shall also consider tlie function 


(k) — I u (t) u (t I k) fit. 


when it exists. We have 


I K {k) dk -= 1 u (<) u (t -f- k) dt dk 

J —00 J —00 J —ZO 

^ f f pip (i+fc) „ (I p -ipi u (t) dt dk. 

J -30 J — QO 

The simple substitution t \- k — q reduces this to 

f w (q) dq f e~^^^ w (t) dt. 

J —CO • J — oo 


Thus, if we write 


we have 


a (p) + (P) -= f e'J^ « (?) dq, 

J — oo 

C It (k) dk = a* (p) + {p). 

J —x 


(30..52) 


(30.63) 


(30.64) 


It follows, as is otherwise evident from the fact that B (k) is an even function, that the 
imaginary part on the left of (30.64) vanishes, and we have 


f B {k) cos kp dk = a* (p) + (p). 

J — uo 


. (30.56) 
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If, following the notation of characteristic functions, we write (p) for the integral on 
the left in (30.54) and (p) for that on the right in (30.53), we have 

4>h(p)-\K(v)V ( 30 - 56 ) 

We may then put (p) — e"', ..... (30.67) 

where p Ls an arbitrary real function. We shall then have 


n (t) 


<l>u (P) dp 

If" 

g-J V^K exp (ip - itp) dp. 


. (30.58) 


Since u (t) must be real, tlie imaginary part vanishes and this is equivalent to 

« (t) = 2 ^ J V^R cos (p ~ tp) dp, . . . . (.30.59) 

and fi must be an odd function of p. The result is due to Wiener (1930). It shows that 
the autocorrelation function E does not uniquely dctc'rmine u {t) because of the arbitrary 
function p. 


30.38. Consider now the autocorrelation function r (k) as defined in (30.51). Let 
us regard the series as defined but equal to zero outside the range — h to \ h. 
Then we have 

2A r (k) — f n (t) u {t + k) dt = \ u (t) u{t k) dt — R (k), . (30.60) 

J-h J-’OO 


wdierc R and r ai’e zero outside the range — 2h to f* 2//.. The foregoing results then con- 
tinue to hold with some modifications concerning factors in 2. If we write — 


and 


ip) = ^- j J (^’) dk = B (k) e'*i' dk 

<?« (P) == (0 C'" dt u it) dt. 


then corresponding to (30.56) we have 


2 ip) = I K ip) I*- 


. (30.61) 
. (30.62) 


. (30.63) 


We may now let h tend to infinity and observe thrtt the results continue to hold under 
certain general conditions, provid(5d that the limits exist. 


Example 30.7 

Consider the series 

u (0 = sin (Ai < + ax) -f sin (A* t + cd) + . . . + A„ sin (A,„ t -f- a„,). 

For the variance we have 

1 f'* 1 r* ^ 

lim 2j^ J ^ (0 dt = lim j ^ ^ {A^ sin* (A^ t + oif) } dt, 

since the cross-product terms will contribute only a finite amount to the integral and hence 
vanish in the limit, 
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= li™ 4 1'* i [A^J {1 - cos 2 (A^ t + Xj) } ] dt 




Similarly for u («) u (t -\~ k) we have 


) 1 J l-^ {-^i (^} t kj k -f- Xf) } ] rfi 

cos {kj {2t 1- k) + 2xj\'\}dt 


= iZA) cos Ay t. 


Thus r (k) = ^ 

' ' >1 •» 


-A-j 


The correlogram is tlie sum of a series of harmonics, like the original series, but the 
coefficienis are different and the luinuonics are all in phase. 


30.39. l^he idea underlying the autoregressive scheme of rejm^senting time -series 
may perha])s be best illustrated by an analogy. Imagine a motor-ear })roceeding along 
a horizontal road with an irregular siirh^(?('. The car is ^itted with springs which permit 
it to oscillate to some extent but are designed to damp out thc^ oscillations as soon as the 
comfoi’t of the ])assengers will permil . Jf the car strikes a bump or a pothole in the road 
the body will oscillate up and down for a time but will soon come to rest so far as vertical 
motion is concerned. If, however, it proceeds over a continual suc(!ession of bumps then? 
will be continual oscillation of varying amplitude and distance between ]>oaks. The oscilla- 
tions are continually renewed by disturbances, though the distribution of the latter along 
the road may bt^ quit(' random. Thc' ngidarily of the motion is tlet(‘rmined by the internal 
structure of the car : but the exlstenn^ of the motion is det(u-mined by external impulses. 

> 30.40. It appears to me v(uy plausible to suppose that oscillations in time-series 

are generated in this way. One do(*s not have to j)ostulate some external rhythmic influence 
which keeps the oscillation going, or to ^mj q)ose that tlu* systcTn will os(?illate withou t 
da mT)ing once it has been s et i n motion, ^oris it necessary to assume that the majority 
of the deviations betw'een TTieory and observation are due to “errors” which exert no 
effect on the subse(pient movement of the system. The Header, however, will have to 
form his own opinion on this matt(*r.* We now proceed to examines an alternative sclicme 
of representation in which the series is rej)rescntcMl as a sum of (undamj)ed) cyclic terms. 


Periodogravi A nalysi .s* 

30 .4 1 . It is well known that under certain general conditions aTunction f It) ca n be 
ex pand ed in the Fourier series, valid in a certain r ange, 

f (t) = Uo +ai cos - + a.i cos -- 1 cos [- . . . 

Al Ai Ai 

+ 6o + sm -b 62 am - + 6* sin y- -f- (80.64) 

Al Al Al 

♦ The scheme* considered in this ehapter may over-simplify natural conditions in that it assumes 
finite random disturbances at equidistant, time-intervals. If the intervals are not etiual, or if the dis- 
turbances aro small and continually occurring, the autoregressive scheme Is only an approximation. 
Much remains to bo done on this subject. 
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Functions which are not periodic can be expanded in this way ; for instance, in the 
range 0 < a; < jr, 

X . 1.^,1. _ 1.*, 

- = sm a: — - sin 2a; + - sin 3a; — - sin 4a; + . . . 

2 2 3 4 

The function of course, repeats itself in the range n <.x <, 2jr, and so on. 

As a representation of observed series the Fourier series is rather restricted in scope, 

since the period of every term is a multiple of the fundamental period 2Ai. A more general 
scheme is provided by the series 

f (0 = Uo + COS 1- Ua COS h • • • 

^ Ai . /a 


, , , , . 27it , , . 2nt . 

+ feo + 6 i sm — + 62 sm — + . . 

Ai Aa 


or the alternative form 


f(t) = Ao + cos 




+ A 3 cos + * 2 ^ + 


(30.66) 


(30.66) 


Here the A’s are not necessarily commensurable. The object of our analysis is first of all 
to find out what are the best values of the A’s to select, and secondly to evaluate the other 
constants a and 6 , or A and a. 

/ 30.42. Suppose we wish to test whether a time-series contains a harmonic term with 

period /i. Consider the series 


= 5 V, 

n4-^ 


(30.67)* 


„ 2 ^ . 27tj 

B — - > ttj sm — - 

f* 


and write 


-S* = + JB* 




Suppose that the series is in fact given by 




. 27ij , .\ .Vl/ 
tt, =0 8in-^ -fU„| (/y 


(30.68) 


(30.69) 


(30.70) 


where bj is a component which we will assume to contain no cyclical element, so that its 
correlation with the other component is zero, at least for long series. Then we have 




* Some writers define these sums with j from 0 to a — 1. The signs of A and B may then differ 
from those given by (30.67) and (30.68), but the Intensity and phase are unaffected. 
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and the second term may be neglected. Thus, writing 


in 

a 


we have 


P > 
P 


A == — E (sin (xj cos fij) 
n 


{sin (a - + sin (a -f p)j} 

Tfh 

_ a r sin i (a-/?) n sin ^ (a-/?) (/i-fl) , sin J (oL+p) n sin J {x+fi) (w + 1) 


sin^ (a-/9) 


sin i (a+^) 


(30.71) 


For large n this remains small unless a approaches // (or - //, which is essentially the same 
situation), and in that case we have 


Similarly, 
so that 


A a sin i (a - {n } 1). 
B a cos \ (a - ft) (n 4- 1 )) 

>S'2 ^ ^ 2 ]i2 _ a'Z, . 


(30.72) 


Thus B remains small unless tlu^ trial ” period fx approaches the real period A, and in that 
case equals the amplitude n. 

30.43. Similarly we may e.xpect that if thp series consists of a sum harmonics | 

with periods Ai, Az, . • . Kv ^ will be small, unless n is equal to one of these periods, in 
which case it is finite and ecpial to tin? amplitude of the term concerned. 

This result forms the basis of what is known as periodogram analysis. We select 
a number of trial periods for ditferent values of // and calculate for each of them. 
which is called the intensity, is then exliibited as a function of (i, and graphed as ordinate 
against y as abscissa. The diagram obtained by joining the points, each to the next, is 
called the periodogrcLtn. If this figure has peaks at (lertain values Ai . . . A,^ and we are 
prepared to assume that these are not sampling accidents, the values are the appropriate 
periods of harmonic terms and the intensity provides the corresponding amplitudes. 
The quantities A and B of (30.fi7) and (30.68) are obtained incidentally and provide the 
phase angles a of (30.66). We shall illustrate the arithmetic processes below. 


30.44. Fig. 30.9 shows the jieriodogram of the wheat-price index data of Table 30.1. 
In order not to confuse the diagram for lower values of the trial period we have shown 
only the major fluctuations. The length of the series was about 300 years from 1545 to 
1844, earlier and later figures shown in Table 30.1 not having been taken into account. 
The primary data have been taken from Sir William Beveridge^s classical paper (1922) and 
are shown in Table 30.9. For practical reasons which will emerge presently, certain trial 
periods are taken not over exactly 300 years but over the number N of years shown in 
the table. To reduce the figures to comparability, Beveridge therefore multiplied the 
N 

sum + JB* by 
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TABLE 30.9 

Periodogram Analysis of the Beveridge Wheai-Price Index Data of Table 30.1. 

(From J.R.S.S., 1922, 85, 412.) 

Tho first observation relates to ir)45, except where A and B are given in heavy typo. 


„ . , Nuinben* 

Period r ! 

,, , , I of 1 earsi 

(Yoars).i 


A. 


B. 


Tntensit V 
N (.42 V/J*) 

300 


Period 

(Years).! 


i Number 

I of cal's! 


A^ 


B. 


Intensity 
N (A^ +5®) 

366 


! 


2 000 I 

300 ! 

1 0*11 i 


0*01 

2049 

330 

- 0 40| 

0 09 

0*19 

2*054 

304 ! 

+ 0*48 1 

- 0*72 

0*77 

2*061 

340 ! 

+ 0 38; 

- 0 57 

0*54 

2*069 

300 

[ 0*25; 

+ 0*63 

0*46 

2*074 

336 

- 0 611 

4^ 0 51 

0*71 

2*080 

312 

+ 0-92 

0*50 

1*14 

2-087 : 

288 

0*52 i 

- 0*11 

0*27 

2*095 

308 

- 0-91 i 

4- 0-90 

I *69 

2*105 

320 ' 

4 0*90 ! 

4- 0*07 

0*86 

2*112 

288 

-h 0*90 : 

1 0*80 

1-38 

2 133 , 

320 

f 0*89 

1 0-15 

0 84 

2*154 

308 

1 0-48 1 

+ 0-23 ' 

0*29 

2*182 i 

288 

4 1*32 

- 0*59 , 

1 *99 

2*200 

308 

- 0*13 i 

- 0*60 , 

0 39 

2*222 

320 

- 0*32 

0*62 ; 

0-52 

2*261 

312 

1 0*50 

0*22 

0*31 

2*286 

320 

- 0*38 

- 0*85 

0*93 

2*316 

308 

f 1*39 ! 

- 1*05 

3*11 

2*333 1 

308 

- 0*10 

0-25 

0-08 

2.353 

320 

} 0*90 

1 007 

0*86 

2.364 

312 

- 0*12 

- 0*63 

0*43 

2*370 

320 

4 0*05 i 

- 0*28 

0*08 

2-375 : 

304 

1- 0-29 : 

0*43 

0*27 

2*381 

300 

- 0*19 

1*22 

1 *53 

2*385 ! 

310 1 

- 1*00 I 

- 0*89 

1*86 

2*391 

330 

- 1*30 

- 0*54 

2*18 

2*395 

309 

0*72 

f 0*60 

0*90 

2*400 

312 

4 0*34 i 

1 0*68 

0*60 

2*412 

328 

- 0 08 i 

0*65 

0*47 

2*417 , 

348 

H 0*63 i 

4- 0*57 

0*69 

2*435 

336 

1 0*44; 

1- 0*01 

0 22 

2*452 

304 , 

... 1-40 j 

- 0*51 

2 23 

2*462 I 

320 

- 0*25 i 

4 1*49 

2*44 

2*476 

312 

- 0*38 I 

4 0*35 

0*27 

2*483 ; 

288 i 

- 0*07 i 

4 0*74 

0*53 

2*500 

320 ^ 

0*24 

f 1*19 

1*56 

2*512 

324 

4 0*86 

1 h0*.39 

0*97 

2*516 

312 

4- 0*45 

f 0*24 

0*26 

2*529 i 

301 

0*19 

- 0*31 ! 

0*13 

2*545 1 

336 1 

- 1 39 

- 0 81i 

2*89 

2*555 1 

322 

4- 0*38 

I f 0*50 i 

0*42 

2*571 i 

306 : 

4^ 1*25 

4- 0*.55 

1*91 

2*588 1 

308 ' 

4 0*30 

i 4- 0*43 ; 

0*28 

2*600 i 

312 

4 102 

; - 0*39 

1*25 

2*615 

306 

- 0*75 

- 0*24 * 

0*63 

2*625 

294 

- 0*45 

1-1 -361 

2*01 

2*643 

296 

4 0*95 

1 -0-«2j 

1*27 


I 


2*667 

312 

- 0-92 ! f 1-20 ! 

2*38 

2*687 

.3(H 

+ 1-23' - 0-02 

1-.52 

2-092 

315 

- 0-04 i H 0-23 ; 

0*06 

2*706 

.322 

-0-27 H 1-.33 

1*97 

2*714 

304 

f 0*83 i + 1*17 ' 

2*10 

2 727 

300 

+ 0*86 4- 1*46 

2*87 

2 733 

287 

+ 2-05 ' -1- M9 : 

6*16 

2*735 

279 

f 2*441 4- 1.231 

7*82 

2*7.37 

312 

+ 2*23 4 1-00 ! 

6-22 

2*741 , 

29(i 

+ 2-43 t 0-25 ; 

5*86 

2*750 

308 i 

1- 0*90 ; - 0*84 j 

1 .55 

2-762 

.348 

- 0 57 ; - 0 04 ! 

0*37 

2*769 

324 

4 1 -49 4- 0-23 j 

2*28 

2*778 ' 

325 

4- 1-20 - 0-92 ! 

2*48 

2*800 

336 

- 1 01 - 0 19 

1*18 

2*818 

310 

4 0*55 + 1*07 

1*49 

2*833 

323 

4 0-78 - 0-10 

0*67 

2-841! 

29(i 

4- 0*41 . + 0*42 

0*34 

2*857 

320 

1 0-90 4 0-21 

1*03 

2*875 

322 

1 0*35 4- 0*14 

0*15 

2-888 

312 

4 1-51 4-0-20 

2*43 

2*895 

3.30 

- 0 69 - 1 57 

3*21 

2*909 

320 

4-0-70 -Ml 

1*84 

2*933 

308 

- 0-04 4- 0-39 

0* 1 6 

2*947 

336 

- 0 93 1 19 

- 2 .57 

2*960 

296 

- 0 00* 1*15 

1*30 

3*000 

300 

0*29 - 0*39 

0*23 

3*040 

304 

4 0*09 4^ 0*75 

0*58 

3*077 

320 

+ 0*05 4 1*18 

1,50 

3*111 

336 

4 0 91 0 44 

1*15 

3*143 

308 i 

i 2*01 ; 4 0*23 

4*20 

3*167 

304 

1 0*46 -- 1*05 

1*33 

3*200 

320 

1 0*43 4 0*95 

1*16 

3*217 

296 

, 4 1-25 4 0 00 

1,55 

3*250 

312 

- 1*22 : - 0*47 

: 1*80 

3*273 

324 

i - 0*.55 4 118 

1*82 

3*286 

322 

- 0* 11 4 0*99 

1*07 

3*304 

304 

! 4 0*13 4 0*75 

’ 0.59 

3*333 

320 

4 0*90 ^ 1*58 

3*54 

3*364 

296 

4 1-76 1 4 0*98 

4*00 

3*375 

324 

• 4- 0-.55 ; 4- 0-92 1-24 

3*385 

308 

4- 0-35: 4- 1-03 

1*21 

3*400 

323 

f 1*12! 4 2*37! 7*41 

3*407 ; 

276 

i 4 2*98' 4 2*81 1 14*90 

3*412 

348 

+ 1-27 - 3-98; 15-53 

3*417 . 

328 

4- 3-08 1 - 2-24 

15*84 

3*429 1 

288 

4- 3-11 ! - 1-40 

1 11-16 

3*444 ! 

310 

4- 0-09 1 - 0-99 

‘ 1*03 
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1 

Period 

(Years), 

Number 



Intensity 

Period 

(Years). 

Number 



Intensity 

of Years 

A, 

B. 

N (A^ + B*) 

of Years 

A. 

B. 

N (A* + B«) 

N. 



300 

N. 



300 

3-465 

304 

+ 0-66 

-f 0-29 

0-39 

4-933 

296 

+ 1-67 

+ 1-68 

4-91 

3-462 

316 

+ 1-67 

4- 102 

4-87 

5-000 

300 

+ 1-86 

+ 1-00 

4-30 

3-600 

308 

-f 1-20 

- 0-94 

2-38 

6-067 

304 

- 0-06 

+ 3-98 

16-09 

3-624 

296 

+ 1-41 

- 1-18 

3-31 

6-091 

336 

- 0-73 

+ 5-55 

35-06 

3-638 

322 

-f 0-60 

- 1-46 

2-53 

6-100 

306 

+ 6-71 

+ 2-98 

42-34 

3-556 

320 

+ 0-02 

- 0-43 

0-20 

51 11 

322 

+ 6-70 

+ 0-29 

34-91 

3-671 

326 

+ 0-80 

- 0-69 

1-21 

6-125 

328 

+ 3 97 

+ 2 90 

26-38 

3-600 

324 

- 1-03 

4- 0-82 

1-88 

5-143 

324 

+ 2-46 

+ 2-46 

13-09 

3-610 

304 

+ M8 

4- 1*23 

2-94 

6-200 

312 

+ 0-02 

+ 0-30 

0-10 

3-636 

320 

+ 114 

4- 0 -13 

1-39 

5-250 

294 

+ 1-74 

+ 1-92 

6-56 

3-643 

306 

- 0-16 

4- 0-27 

0-10 

6-333 

320 

+ 0-71 

- 4-46 

21-72 

3-667 

308 

-2U 

- 1-07 

5-87 

6-400 

324 

+ 1-04 

+ 3-71 

16-06 

3-679 

309 

+ 0-34 

- 1-90 

3-83 

6-415 

325 

+ 4-27 

+ 1-90 

23-66 

3-692 

288 

+ 1-28 

~ 0-22 

1-63 

6-429 

304 

+ 4-72 

~ 0-28 

22-61 

3-700 

296 

4- 0-90 

- 0-69 

1-18 

6-466 

300 

+ 1-37 

- 3-73 

15-76 

3-714 

312 

+ 116 

4- 1-78 

4-66 

5-600 

308 

- 1-04 

+ 1-49 

3-39 

3-727 

287 

- 0-46 

- 1-65 

2-72 

6-565 

300 

+ 2-40 

- 0-68 

6-23 

3-760 

316 

-f 0-64 

- 0-06 

0-44 

6-600 

336 

+ 0-46 

+ 1-21 

1-88 

3-778 

306 

- 1-17 

- 0-68 

1-86 

6-667 

306 

+ 6-31 

^ 1-97 

32-72 

3-800 

304 

+ 1*60 

4- 0-80 

3 24 

5-692 

296 

+ 2-05 

- 3-91 

19-18 

3-833 

322 

- 1-12 

- 1-63 

4-17 

6-714 

320 

+ 0-35 

- 2-13 

4-97 

3-867 

324 

+ 1-63 

+ 0-45 

3-08 

6-750 

322 

+ 1-39 

- 0-33 

2-18 

3-888 

280 

~ 0-16 

+ 0-66 

0-43 

5-800 

290 

+ 3-66 

- 2-76 

19-47 

3-896 

296 

~ 0-66 

4- 1-00 

1 42 

6-846 

304 

+ 0-00 

- 2-29 

.5-3.5 

3-923 

306 

4- 0-64 

- 1-61 

3-06 

6-933 

366 

+ 4-37 

+ 0 91 

23-03 

3-962 

309 ! 

- 0-67 

4- 1-741 

3-69 

6-000 

300 

- 3-50 

- 0-12 

12-29 

4-000 

300 j 

4- 1*47 

- 1-13 

3-64 

6-111 

330 : 

- 0-79 

- 1 90 

4-66 

4-077 

318 ! 

4- 0-67 

- 0-26 

0-41 

6-143 

301 

+ 0-74 

- 2-96 

9-32 

4-111 

296 

1 + M3 

- 1-70 

4-13 

6-167 

296 

- 0-22 

-- 2-94 

8-66 

4-143 

290 1 

- 0-60 

+ 0-23 

0-30 

6-200 

310 

- 2-02 

- 3-38 

16-02 

4-167 

^25 

1 + 1*21 

+ 0-32 

1 1-70 

6-250 

326 

! - 3-23 

- 0-11 

11-30 

4-173 

322 

! 4- 0-66 

- 1-46 

2-77 

6-286 

308 

! - 1-72 

- 0-69 

3-41 

4-200 

294 1 

- 0-99 

- 0-41 

1-02 

6-333 

304 

! - 1-62 

+ 1-29 

4-02 

4-260 

323 1 

4- 0-60 

- 2-73 

8-32 

6-400 

320 

i + 0-80 

+ 2-74 

8-71 

4-286 

300 i 

- 0-65 

4* 0-79 

1-04 

6-600 ! 

312 

1 + 0-69 

- 0-73 

0-94 

4-333 

312 1 

- 1-60 

- 1-30 

4-10 

6-671 

322 

1 + 1-49 

- 0-77 

i 3-02 

4-363 

296 

- 2-85 

- 0-24 

8-06 

6-667 i 

320 

i + 0-25 

+ 0-21 

i 0-11 

4-364 

288 

- 2-98 

+ 0-76 

9-07 

6-727 

296 

+ 0-08 

- 0-13 

0-02 

4-376 

315 

- 2-47 

+ 0-87 

7-19 

6-760 

324 

- 0-20 

- 1-66 

' 3-01 

4-386 

342 

i - 0-50 

+ 2-55 

7-72 

6-800 

306 

+ 0-23 

- 0-66 

j 0-48 

4-400 

308 

- 1-38 

+ 3-27 

12-89 

6-909 

304 

+ 0-68 

+ 2-66 

1 7-00 

4-412 

300 

4- 0-08 

+ 3-62 

13-11 

6-933 

312 

+ 1-68 

+ 2-01 

i 7-16 

4-417 

318 

4- 0-87 

+ 3-85 

16'48 

7-000 

308 

+ 3-10 

-- 2-17 

! 14-74 

4-429 

310 

4- 1-80 

+ 2-41 

9-32 

7-143 

300 

+ 1-83 

-1-861 6-79 

4-444 

320 

+ 2-16 

' + 0-83 

1 5-66 

7-200 

324 

j + 0-54 

- 3-93 

16*96 

4-471 

304 

; 4- 0-91 

+ 0-79 1 1-48 

7-333 

308 

! + 1-62 

- 2-81 

10-46 

4-600 

306 

j 4- 1-87 

+ 0-72 

4-09 

7-400 

296 

- 2-33 

- 2-72 

12-66 

4-671 

320 

i - 0-21 

+ 0-04 

0-22 

7-417 

366 

+ l-50i - 4-01 

21*72 

4-600 

322 

! - 0-08 

+ 1-24 

1-66 

7-429 

312 

- 3-80 

j - 1-49 

17-28 

4-667 

336 

+ 0-19! 4- 0-93 

1-00 

7-600 

315 

+ 0-17 

+ 1-60 

2-40 

4-760 

304 

: - 0-12 

+ 2-28 

5-28 

7-600 

304 

- 2-33 

1 - 1-37 

7*43 

4-800 

288 

+ 2-44 

+ 1-081 6-84 

7-667 

322 

- 1-46 

- 2-61 

9-57 

4-867 

306 

- 1-06 

- 1-30 ! 2-89 

7-760 

310 

+ 1-38 

- 0-39 

2-13 

4-888 

1 312 

- 1-80 

+ 2-11 

800 

7-867 

; 330 

- 0-50 

+ 0-28 

0*36 
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period 

(Years). 


Number 
of YearsI 

N. 


8000 

8091 

8-200 

8-222 

8-333 

8-500 

8- 667 ; 
8-800 : 

9- 000 i 
9-200 * 
9-333 i 

9- 500 ; 
9-667 ! 

9- 750 • 

9- 818 ' 

10- 000 i 

10 - 200 , 

10- 250 
10-400 
10-500 ■ 

10- 750 i 
10-800 1 

11 - 000 
11-200 I 
11-500 ! 

11- 667 i 

12 - 000 j 
12-143 
12-333 I 
12-500 
12-667 i 

12-800 i 

12- 875 i 

13- 000 
13-333 
13-500 

13- 667 

14- 000 
14-500 

14- 667 

15- 000 
15-200 
15-250 
15-286 
15-333 

15- 500 

16- 000 

16- 667 

17- 000 
17-333 


B. 


1 ... 


312 

356 

287 

296 

325 

323 
312 
308 
306 
322 
336 
304 
290 
312 

324 
320 
306 
328 
312 
294 
301 

324 

308 
336 
322 
280 
312 
340 
296 

325 
304 
320 

309 
312 

320 
324 
328 
308 
290 
308 
300 

304 

305 

321 

322 

310 
320 
300 

306 
312 


- 3-96 
+ 4-32 
f 1*62 
+ 0-19 
4 - 0-21 
+ 0-17 
4* 2-51 
4- 2-97 

- 1-51 

- 0-16 
- 0 - 74 ! 
4- 1-08 
4 5-03 


4- 1-34 

- 0 98' 

- 0-64 I 

- 0-56 ; 
4- 0-91 i 
+ 3-19 I 

- 1-01 
4- 0-83 • 

- 0-57 

- 1-561 
4- 0 64i 
4- 1-07 i 
4- 0-37 i 


4- 4-46 I - 3-56 i 


4 - 1*21 

- 1-19 
4 - 0-86 

- 0-69' 
4 - 1-88 
1- 2-46 
f 1-47 
+ 1-00 

- 3-85 

- 2-481 

- 1-32 


4 94i 

- 0-83 j 
0-22 

4 1 10 

- 1-65 

- 1-82 

- 3-13! 

- 4-75 i 

- 4-26 i 
+ 0 55i 

- 0 - 66 ! 


4- 0-46! + 1-42 


. 2-47 

- 0 22i 

- 2-44 
~ 1-22 
4- 2-28 
+ 5-70 
4- 6-46 
4- 4-26 
+ 0-40 
4- 2-56 
+ 3 49 ! 
4- 115 

- 3-78 

- 1-50 
4- 6-32 
+ 1-19 

- 0-28 

- 2-35 

- 3-89 

- 6-92 

- 1-46 
4" 5-21 
4- 2-56 
~ 3-04 


4-04 , 


4 - 37 ! 


4- 2-74 ! 
4 2-63 : 
4- 5 - 19 ! 
4 - 3-26 i 
-f- 0-77 I 
- 4-32 i 


+ 0-37 ! 
2-09 i 

- 1 34] 

- 1-00 i 

- 0-18 I 
4- 4-23 

- 2-66 

- 8-52 

- 8-65 

- 7-15 
6-55 

- 2-02 
4- 4-52 

- 0-39 

- 6-35 

- 6-65 


Intensity 

PericKi 

Years). 

N’umber 1 


i 

i 

Intensity 

N{A*+B*) 

300 

of YearsI 
N. 

-4. 

1 

1 

1 

N (A» + B*). 
300 

18-67 

17-500 

1 

280 i 

- 6-18 

- 4-46 1 

54-12 

23-23 

18-000 

306 

- 4-40 

+ 1-26 ! 

21-29 

2-90 

18-500 

296 > 

- 1-46 

+ 2-26 ! 

7-10 

0-34 

19-000 

304 1 

-h 1-00 

- 0-23 

1-07 

0-95 

19-750 

316 ! 

- 4-73 

- 1-59 1 

26-25 

10-41 

20-000 

320 1 

- 5-71 

4- 1-69 i 

37-88 

7-59 

21-000 

204 1 

f- 0-78 

+ 2-61 1 

7-28 

9-77 

22-000 

308 j 

4- 1-87 

4- 1-681 

6-18 

2-66 

23-000 

322 1 

- 2-46 

- 1-43 1 

8-61 

2-65 

24-000 

288 1 

4- 0-45 

+ 619! 

26-10 

1-08 

24-667 

296 

+ 4 31 

4- 1-99 i 

22-21 

2-26 

25-000 

325 1 

4- 3-86 

- 0-19 i 

14-94 

24-55 

26-000 

312 

+ 1-23 

- 1-.34! 

3-43 

33-89 

27-000 

324 

+ 0-50 

- 0-33 1 

0-38 

27-90 

28-000 

308 

- 0-49 

4* 0-68 

0-72 

2-25 

29-000 

290 

4 - 1-08 

- 2-12 

5-46 

0-80 

30-000 

300 

- 1-53 

- 2-34 1 

7-81 

1-84 

31-000 

310 

- 1-98 

+ 0-13i 

4-06 

6-52 

32-000 

320 

- 0-37 

4- 0-61 

0-42 

9-19 

33-000 

330 

+ 0-96 

- 0-78 

1-68 

11-98 

34-000 

306 

- 3-00 

- 2-16 

13-90 

25-48 

35-000 

280 

- 4-64 

4- 1*79 

23-11 

33-84 

36-000 

288 

- 1-66 

4- 4-86 

23-29 

7-24 

37-000 

296 

4- 2-08 

4- 3-92 

19-47 

2-34 

38-000 

304 

4* 2-99 

4- 0-66 

9-37 

2-07 

40-000 

320 

- 1-44 

- 0-63 

2-63 

23-30 

41-000 

328 

- 1-93 

4- 0-93 

6-01 

21-66 

42-000 

294 

4- 0-93 

+ 3-02 

9-75 

11-43 

44-000 

308 

f 3-00 

- 0-14 

9-27 

9-13 

46-000 

315 

4~ 1-69 

- 1-99 

7-14 

32-58 

46-000 

322 

i 4“ 0 -16 

- 2-27 

5-58 

46-01 

48-000 

288 

- 0-76 

- 0-09 

0-66 

43-58 

60-000 

300 

+ 1-83 

4- 2 -19 

8-14 

38-23 

52-000 

312 

4- 4-77 

- 0-57 

24-03 

0-32 

63-000 

318 

4- 4-22 

- 2-60 

26-08 

11-79 

64-000 

324 

4 2-84 

- 4-01 

26-09 

15-28 

66-000 

330 

4- 3-54 

- 3-30 

25-82 

2-38 

56-000 

336 

+ 3 31 

[ - 2-36 

18-47 

13-82 

58-000 

290 

4- 3-89 

+ 1*49 

16-82 

20-69 

60-000 

300 

~ 3-08 

- 0-93 

10-32 

46 83 

62-000 

! 310 

- 1-62 

+ 0-39 

2-88 

75-04 

64-000 

1 320 

- 0-78 

4-0-13 

0-66 

76-17 

66-000 

! 330 

- 0-56 

- 0-66 

0-69 

60-62 

68-000 

' 340 

4- 2-91 

) - 1-88 

1 13-58 

62-29 

70-000 

280 

- 0-6fl 

» - 0-16 

0-47 

59-11 

74-000 

296 

- 1-2C 

\ 4-0-82 

2-07 

’ 24-02 

76-000 

304 

- 0-6€ 

\ 4- 1*17 

1-83 

27-33 

78-000 

312 

4- 0-5i 

\ 4- 1-26 

2-00 

47-84 

80-000 

> 320 

4- 0-7'; 

J 4-0-82 

1-34 

54-55 

84-000 

> 336 

4- 0-2 

6 4- 0-6^ 

) 0-62 


44. 
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An examination of the periodogram suggests the possibility of 20 periods, as follows : — 


Period 

(Years). 

Corrected Intensity 
JV 4- B*). 

300 

Period 

(Years). 

Corrected Intensity 

N {A^ + B*). 

300 

2-736 

7-82 

11-000 

33 84 

3-417 

15-84 

12-000 

23 30 

4-417 

16-48 

12-800 

46-01 

6-100 

42 34 

15-260 

76-17 

6-416 

23-66 

17-333 

54-55 

6-667 

32-72 

20-000 

37-88 

5-933 

23-63 

24-000 

26-10 

7-417 

21-72 

35-000 

23 29 

8-091 

23-23 

54-000 

26-09 

9-760 

33-89 

68-000 

13-58 


This is evidently rather an embarrassing profusion of possibilities, and we cannot 
immediately accept all these periods as significant. Sir William discussed them in detail 
in the original paper and was inclined to attribute reality to 18 or 19 of them, partly on 
grounds which do not concern us here, such as the existence of weather oscillations with 
these ‘‘ periods In particular, where a i)eriod had a high intensity he analysed the 
two halves of the series separately to see whether the })eriods persisted, finding that most 
of them did. 


30 . 45 . An inspection of the correlogram of the series in Fig. 30.5 reveals a striking 
difference between the two methods of analysis. From the correlogram we should be 
inclined to suspect a mean period of about 15 years, corresponding to the peak of greatest 
intensity in the periodogram, with a subsidiary ripple of about 5 to 6 years’ period, corre- 
sponding to one or more of the peaks in the periodogram ; but of the other 18 periods there 
is no sign. The conclusion is inevitable that either the correlogram is insensitive or the 
periodogram is misleading. Having raised this highly important question we shall, unfor- 
tunately, have to leave it unsettled in part ; but we shall show that at least three-quarters 
of the periods thrown up for consideration by the periodogram are not significant. 


30 . 46 . The calculation of the intensity depends on that of the quantities A and B 
of equations (30.67) and (30.68). Suppose in the first place that our trial period /x is an 
integer. We then write down the series in rows of /i, thus : — 




'Hz 

. . . 




• . . U2^ 

^(p-l) t*+l 

^(p-l) A4 + 2 


• • • 

Totals Ml 

m% 

ma 

. . . 


(30.73) 


We continue writing down the rows until there are fewer than fi terms remaining, the 
extra terms being left out of account. The number pfi is then as near in multiples of pt 
as we can get to the number in the series w, and may be denoted by N. This array is some- 
times known as the Buys-Ballot table. 
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We then form the sum — 


P/^ I 


27C 

mi cos — + m 2 cos h 


. + m„ cos - V 

J 


(30.74) 


and this is clearly the quantity A of (30.67) for the series of N terms. Similarly we have 


If the trial period /i is a rational fraction - we write the series down in rows of v and 

o 

proceed in the same way ; and if it is irrational or is a number which gives a large value 
of V when expressed as a fraction, we take two convenient neighbouring values of /i and 
interpolate in the periodogram. 


30.47. In actual practice we do not write down the array (30.73). The sums m 
may be formed on an adding machine by starting with Uy and then adding every /^th mem- 
ber to give mi ; then starting with and adding every ^th member to give m 2 , and so on. 
Or alternatively, the values may be written on cards, one for each member of the series, 
and the pack dealt into fi heaps. The total of the m’s, together with any members left 
over, equals the sum of the series and [provides a check on the work. 


Example 30.8 

Consider the Beveridge series of Table 30.1. For the trial period 2 we may take 300 
terms of the scries, and m, (about zero mean) will be the sum of the values Uyy 
and mi will be the sum of the values with even subscripts. These sums are for the years 
1645 to 1844 inclusive, 

mj = 14,909 ^ 

mg =- 14,893. 

The mean is. 14,901, so that about the mean of the series 

mi = + 8 

mg — — 8. 

Now, for a trial period 2, sin vanishes and hence JB = 0. For A we have (in our nota- 


tion, which gives different signs 


from Beveridge’s to A and B) — 

. 2 f 2 jc , ^Ti') 

A ^ ~ s mi cos — + m 2 cos y 
300 \ 2 2 1 

::== {m 2 — mil 

300 ^ ^ 


32 

300 


Oil. 


Thus 


8^ (corrected) = — ^ A^ 0*01, 


as shown in Table 30.9. 

% 13 

For a trial period 2-600, we could take f* — and arrange the series in rows of 13, 

requiring 23 rows accounting for 299 values of the series. We may, however, save our- 
selves some arithmetic by taking 24 rows, a multiple of 4, occupying 312 observations. 
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Or rather, we take 6 rows of 62, giving us the values for a trial period 62 ; then add mi 
to mil, mt to mu and so on, giving the result we would have got by taking 12 rows of 26 
and hence providing the values for a trial period of 26 ; then we add again in the same way, 
and so on, obtaining successively the values of m required for trial periods of 13, 6*6, and 
3*26. Similarly, by multiplying the origined 62 values of m by the respective values of 

cos and sin we get the values of A and B required for a trial period of It is 
62 62 10 

thus evident that we can use the single set of 62 values of m to provide the required constants 

for trial periods and so forth. This is the main reason why, in Table 30.9, 312 

observations are shown as N for the trial periods 2*080, 2*261, 2*364, 2*476, 2*600, 2*737, 
2*888, 3*260, 3*714, 4*333, 6*200, 6*600, 7*429, 8*667, 10*400, 13*000, 17*333, 26*000 and 
62*000. The arithmetic, though difficult enough, is not as laborious as appears at first sight. 


30.48. There is an interesting relation between the periodogram and the correlogram 
by which the latter, in theory, determines the former. We consider, as in 30.38, a function 
V, {t) defined at every point of time in some range — h to h. Then 

a (j)) + ijj (p) = ^ I u («) dt 

= ^ f cos pt u (<) j I* sin pt u (t) dt . . (30.76) 

* J -» * J -A 

corresponds to the sums of (30.67) and (30.68) and may be written A + iB, where 



. (30.77) 


It follows that the intensity 8* is related to the Fourier transform of r {k) by the relation, 
derived from (30.63), 

S^ = Hr(p) 

o Pih 

= ^1 r(k)e^dk, . . . (30.78) 

^ J-2ft 

whioh is true also in the limit, subject to conditions of existence. Thus the intensity is, 
if r (h) exists over an infinite range, the quantity — 

2 f 

lim Y I ^ cos hp dkj 

" J -2fc 


and if R (k) exists the parallel quantity — 




r.. 


B (k) cos kp dk. 


le periodogram is thus derivable from the autocorrelation function. Since the latter 
does not uniquely determine the series the periodogram will not do so either. 


Example 30.9 

Cmisider the autocorrelation function, whioh in present notation may be written 

p*’ sin (ik6 4* p) 
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This, as we have seen, represents the correlogram of an autoregressive series of the simple 
linear kind involving and We may write this as 




g > 0 


since p is less than unity. It is to be remembered that since R ( — k) == R (k), the modulus 
of k is to be used when k is negative. 

We have 


c~ 1^*1 sin (kO + y>) 


cos kp dk 


cos kO cos kp dk 


q^ + (e +py^ + (0 - ~P)^' 

2ji 

This is the intensity in the periodogram of the series, p being the quantity — and not to 
be confused with our original damping factor p. 


It is remarkable that, as p becomes large, 8^ tends to the constant value 


+ 02 * 


that is to say, the periodogram tends to a fixed level, without peaks. From the analogy 
with the analysis of light-rays into colours (each colour corresponding to a particular har- 
monic), we may say that the periodogram develops a continuous spectrum In a 
very interesting chapter on periodogram analysis Davis (1941) has given a number of 
examples exhibiting this kind of effect. 

Si^nificarice ^^ a PerM^^ra^ 

^30 .49 . Su^ose that the values Ui , , , are random elements from a normal 
population with variance Then the function 


A 


is normally distributed with variance 




and similarly 


var JB = 


. (30.79) 


. (30.80) 


We also see that cov (A, 5) == 0 so that A and B are independent. Hence the joint 
distribution of A and B is 


dF = 


L (4» + B*) j. dA dB. 


. (30.81) 


A.S. — ^voii. n. 


F F 
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Thus the distribution of /Sf* = + .B* is 

= 

1C 

The probability that S* exceeds in value is immediately obtainable as e~*. 


(30.82) 


30.50. This result is due to Schuster (1898), but it gives only the probability that 
a value of S* chosen at random will exceed a given value ; whereas in the periodogram 
we deliberately pick out the biggest values for inspection. Walker (1914) pointed/>ut that 
if e'” is small the probability that all of m independent values of 8^ should not exceed 
4cr^ic 

is (1 — e""*)*”, so the probability that at least one should exceed that amount is 

1 - (1 - . . . . . (30.83) 

Davis (1941) gives tables of this function. 

30.51. Both the Schuster and the Walker tests depend on a knowledge of Since 

4o*^ 

the mean value of 8^ in (30.82) is the usual procedure is to consider the test as a com- 
parison of with E (8^) ; but itself has to be estimated from the original data. 


30.52. Fisher (1929a) has given a test which avoids the inexactitude due to the 
estimation of a*. If v is the estimate and 8^ is the largest intensity, then the probability that 



. (30.84) 


will exceed a given value is 

V (1 - 1^)'-' - ( 2 ) +••+(- 1)”-^ (l) 


where v = ^ (n — 1), n being the (odd) number of observations, and m is the greatest 
integer less than 1 /g. The result was extended by Stevens (1939a) — see also Fisher (1940a) 
and Finney (1941a). Davis (1941) also gives tables of this function. 


30.53. All the tests we have described are based on random normal variation in the 
original series ; but in practice nobody would embark on the labour of a periodogram 
analysis unless he had satisfied himself that the data were not random. It seems to me, 
therefore, that these tests are really off the main point, being tests based on a hypothesis 
which we have already rejected. They are not without their usefulness, however. We 
may assume with some confidence that if a particular intensity in the series is not shown 
as significant on the hypothesis of random variation, it is not significant when the series 
is systematic. What does not follow is that if one intensity is significant then others must 
be so, even if they exceed the significance values ; for they are not independent of the 
significant value, at least for short series. What we ought to do, perhaps, is to extract 
the component which is considered significant from the series and then analyse the 
remainder ; and so on as long as significant terms appear. But this is hardly a. practical 
computational possibility. J Tests of significance in the periodogram, as in the correlogram, 
remain ui^iiifipvered. 
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Example 30,10 

Let us examine the significance of the 20 periods of the Beveridge periodogram given 

in 30.44. 

Sir William gave the value of — in his original paper as 5'898. Expressing the 

Th 

intensities as a multiple k of this amount, we find : — 


Period. 

K. 

Period. 


• 

2-736 

1-33 

11-000 

6-74 


3-417 

2-69 

12-000 

3-»5 


4-417 

2-79 

12-800 

7-80 


5-100 

7-18 

15-250 

12-91 


6-415 

4-01 

17-333 

1 9*26 


5-667 1 

5-65 

20-000 

6-42 


5-933 j 

4-01 

24-000 

4-43 


7-417 , 

3-68 

36-000 

3-95 


8-091 1 

3-94 

54-000 1 

4-42 


9-760 ! 

i 1 

5-76 

68-000 

2 30 



There are 305 trial periods in Table 30.9. Let us consider the probability that at least 
one of 305 independent values of k will exceed given values, that is to say, the probabilities 
given by (30.83). We find — 

#c Probability. 


2 

4 

6 

8 

10 


1-000 

0-996 

0-531 

0-097 

0-014 


On this basis we should be inclined to attribute significance to the period 15-25, for which 
K — 12-91. We have no right to be surprised that at least one value exceeds #c ~ 6. If 
we take this value as the critical one, only the periods 5-100, 12*800, 15-250, 17-333 and 
20*000 would be significant, that is to say, five out of 20. 

Again, since e ® 0-007, we should exjKJct to find in 305 independent members two 

in excess of 5. Actually there are eight. But they are not independent and we cannot 
rely on this comparison to say that six are significant. On the whole, however, it looks 
as if at least three-quarters of the periods are not significant, and possibly more. The 
example will illustrate the difficulty of testing the significance of the periodogram as a whole. 


Lag Correlation 

30.54. The idea of serial correlation can be extended to the joint variation of two 
series. If we have two series u (t), v («) in standard measure, we may define the lag corre- 
lation of order k as 

r (jfc) = j « (0 v{t -\-k)dt (30.86) 

where the integral iiusludes summation in the case when the series are specified at equi- 
distant points of time. We note that in this case r {k) is not equal to r{—k) and r (0) 
is not unily. 
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Table 30.10 shows the lag correlations between two series of English wheat prices and 
horse populations (for the original series see Kendall, 1944a). The data are shown as a lag 
correlogram in Fig. 30.10. 


TABLE 30.10 

Lag Correlations for Two Series of English Wheat Prices and Horse Populations (Deviations 

from a Simple Nine~Year Average). 

(The order of the correlation is the number of years by which horse population lags behind wheat price, 
©•3* ^10 is the correlation of wheat price with the horse population of ten years earlier.) 


Order of 
Correlation 
k. 

ric- 

Order of 
Correlation 
k. 

ric- 

- 10 

- 0*22 

1 

0-24 

- 9 

- 019 

2 

- 0*36 

- 8 

- 0*24 

3 

- 012 

- 7 

- 016 

4 

016 

- « 

- 0 09 

5 

0*17 

- 6 

007 

6 

0-39 

- 4 

0-27 

7 

0-36 

- 3 

0-31 

8 

016 

- 2 

0*41 

9 

- 016 

- 1 

0*25 

10 

- 0*44 

0 

1 

- 012 

i 

i 



Via. 80.10. — Lag Correlation of Wheat Frioee and Horse Populations (Table 30.10). 


NOTES AND REFERENCES 


437 


The systematic appearance is unmistakable and we notice in particular that the maximum 
correlation occurs between the wheat price and the horse population of two years later. 
This bears the obvious explanation that when a farmer earns more he buys or breeds more 
horses ; but it does not follow logically that this must be so or that there need be any 
causal nexus between the two series. If two autoregressive series are oscillating with 
mean periods which are close together and only a short span of experience is available for 
scrutiny, then lag correlations of the damped sinusoidal type may appear, as it were, by 
accident. 


30.55. We have now reached the end of our account of the statistical analysis of 
time-series and the end of this book ; and the final words we have to say of the one will 
®'PP^y generally to the other. Much has been left unsaid, partly from lack of space, partly 
from deficiencies in the present state of knowledge, and partly from a desire not to over- 
burden the reader. We have not avoided mathematical analysis where it was necessary 
to advance the argument ; but we have insisted on the expression of results in numerical 
form and the necessity of experimental confirmation whenever it could be obtained. That 
there are gaps in the treatment we have given and unexplored branches of the subject 
to which we have barely referred are not entirely matters of regret ; for the over-early 
and peremptory reduction of knowledge into arts and methods is one of the errors which 
Bacon cautioned us against more than 300 years ago. Much remains to be done ; and this 
book will have served its purpose if the reader is left with the desire to do some of it himself. 


NOTES AND REFERENCES 

The theoretical aspects of the autoregressive series and of moving averages are dis- 
cussed in Wold’s book on The Analyfiis of Stationary Time-Series ( 1938a). The basic 
memoir is that by Yule (1927a) on sunspots. For applications to meteorolo gy see Walker 
(1931) and to economics Kendall (1944a), Davis’s book on The Analysis of Economic Time 
Series (1941) contains a great deal of interesting material but should not be read uncritically. 
Two earlier papers by Yule (1921 and 1926) are also of interest. See also my paper on 
The Analysis of Oscillatory Time-Series ” in the Journal of the Royal Statistical Society 
for 1946, a paper by Yule in the same journal, my brochure (in press) on ‘‘ Researches in 
Oscillatory Time-Series ”, and a symposium introduced by Bartlett in the Supplement to 
the Journal for 1946. 

The classical work on periodogram analysis is that of Schuster (1898). The books 
by Brunt (1931) on The Combination of Observations and by Whittaker and Robinson 
(1940) on The Calculus of Observations contain useful introductory accounts ; and Davis’s 
book referred to above has an excellent chapter illustrated with an unusual number of 
examples. Papers by Crum (1923) and Greenstein (1936) are of interest. The papers by 
Sir William Beveridge (1921, 1922) on wheat prices and rainfall have been justly described 
by Davis as a heroic piece of periodogram analysis. Tables facilitating the calculation 
of intensities were published by Turner (1913), and more complete tables will be given ip 
my brochure referred to above. See also the book by Stumpff (1937). 

Various short-cut methods of periodogram analysis have been proposed by several 
authors, e.g. Oppenheim (1909), Bruns (1921) and Alter (1933, 1937) ; but their value is 
problematical. There is a useful memoir by Bartels (1936) which is worth studying. 



438 


TIME-SERIES 


EXERCISES 


30.1. For the autoregressive series 


^ 1+2 4 " " 1 “ — ®/+2 

show that if £ is a random variable and the series is long, 


var u __ 1 -f 6 

v ^ ““ (1 - 6 ) {(1 + 6 )* 

and hence that the variance of the generated series may be much greater than that of 
e itself. 


30.2. For the autoregressive series of the previous exercise use the relation 

ffc+a + «^ib+i + = 0, fc > — 1 

to derive the relation 

__ sin (kO -f %p) 
sin y 


30.3. If the estimated coefficients a' and 6' in the autoregressive scheme are reduced 
in the manner of 30.32 by a superposed error, show that * 



(Yule, 1927a.) 


30.4. Show that if, in the autoregressive scheme of Exercise 30.1, 6 = 1, the series 
becomes undamped and the correlogram reduces to a simple harmonic. Examine the 
effect on the solution (30.23). 


30.5. If any series has fitted to it a series generated by the scheme of Exercise 30.1, 
a and b being any constants, show that for the serial correlations of the residuals, say 
we have 


1 -f- CL^ -f” b^ -f- 2a (1-^-6) Pi -f” ^bp 2 


30.6. 


Show that the series with an autocorrelation function 


r{k) = 


sin Xk 

"Air 


has a periodogram which is zero for periods less than -r and has ordinate j for periods greater 

A A 

than i.e. has a continuous spectrum. 


2m7t 


30.7. In equation (30.71), noting that the dominant term vanishes for a — = 

where m is an integer, show that for such a “ vanishing ” trial period n 

/M = A approximately. 
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Hence the width of a peak in the periodogram is approximately — , and the main peak 

7h 

will be flanked by smaller peaks of the same width. (This ‘‘ side-band ” effect is another 
complication in the interpretation of the periodogram, but not apparently a very serious 
one.) 


30.8. If a series of values ... supplemented by a number of zeros as 

^-2 • • • as far as is necessary, and the resulting series differenced, 

show that 







+ 2(- 1)^P,., 


where is the sum of squares of jth differences and == 



Xk Xk+i- 


Hence show that 


the arithmetic of serial correlation may be related to that of the variate-difference method, 
and vice-versa. 


30.9. Show that the serial correlations of a long series obtained by differencing a 
random series m times are given by 

. m (m — 1) . . . (m — + 1) 

(m + i) . . . (m + k) 
and hence that the correlogram of such a series oscillates. 

(Yule, 1921.) 




30.10. The Whittaker periodogram. Writing 

2 , . varm 

n (/^) ^ » 

var u 

where var u is the variance of the series and var m is the variance of the sums m of (30.73), 
show that if 

Uj ^ a sin ^ + bj, 

where bj is uncorrelated with periodic terms, then 

4- ^ var & 

I’W- - ■■ 

Hence show that, in the neighbourhood of A, the graph of rj as ordinate with as abscissa 

2A* 

(Whittaker’s periodogram) has a peak of breadth flanked by smaller peaks. 

(Whittaker, Month. Notes B. Astr. Soc., 1911, 71 ; cf. Wliittaker and Robinson, Calcvhta of 
OhaervaHona.) 
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ADDENDA TO VOLUME I 

(1) Frequency and Distribution Functions 

An intereating paper by Burr (1942) considers the possibility of fitting elementary 
mathematical functions, not to the frequency function as has been the almost universal 
practice hitherto, but direct to the distribution function. This approach seems to merit 
further attention. In general, the distribution function has fewer analytical peculiarities 
than the firequency function — ^for instance, it cannot be infinite — and in applications to 
sampling it is the former which is nearly always required. The frequency function can, 
of course, be derived from the distribution function to a close approximation by differ- 
encing, or differentiation, processes which are usually easier to carry out than the inverse 
processes of integration. 

(2) Extension of ihe CarUman Criterion (4.22) 

Cramer and Wold (1936) have extended Carleman’s criterion for uniqueness in the 
problem of moments in the following form : — 

If 

^ + /*oio... -t f*ow... 

the distribution is completely determined by its moments if 



diverges. It is rather interesting that the criterion is independent of the product-moments. 

(3) Convergence of Series Leading to Standard Errors 

The usual type of expansion in differentials, exemplified in 9.6, raises a point of mathe- 
matical difficulty in that the differentials themselves and the remainder terms, though 
usually small, may sometimes be large for sampling reasons, however large the sample. 
The necessary rigorisation of the process has been given by Derkson (1939) in terms of the 
notion of sUx^fOStic convergence, that is to say, a sort of statistical convergence in which 
the series converges nearly always in a precisely defined sense. 

(4) Moments of Moments for Finite Populations 

The formulae for moments of the mean and variance in samples from a finite population 
were stated without proof in 11.26. It is obvious that if in these results we let N, the 
population number, tend to infinity, we obtain the formulae for sampling from an infinite 
population. Irwin and I (1944) have recently shown that the process may be reversed 
and the formulae for the finite case derived ^m those for the infinite case. This offers 
the simplest and most direct method of deriving the formulae, known to me. Reference 
may also be made to Sukhatme, “ On Bipartitional Functions ” (PM. Trans., 1938, A, 
237, 375) and “ Moments and Product-Moments of Moment-statistics for Samples of the 
Finite and Infinite Populations ” {Sankhya, 1944, 6, 363). 
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(5) Tied Banka 

In the treatment of rank correlation in Chapter 16 it was assumed that ranking was 
always possible ; but in practice cases occur when two or more individuals ‘‘ tie and the 
ranks have to be equalised in some way. This possibility introduces the most intractable 
complications into theoretical work, but sometimes ties occur so frequently that a systema- 
tic method of dealing with them is necessary. The subject has been reviewed and recon- 
sidered by Woodbury (1940) and more recently by myself (Biom., 1946, 33, part 3). 

( 

(6) Coefficients of Rank Correlation 

Daniels (1944) has recently unified the theory of rank correlation by showing that 
Spearman’s />, my t and the product-moment coefficient are particular cases of a general 
coefficient. In particular he has demonstrated the formula for the covariance of p and t 
given in 16.24 as very probably true. 
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BIBLIOGRAPHY 

The following Bibliography has no pretensions to completeness in spite of its length. 
It contains about half the titles recorded in my own notes, which themselves are doubtless 
far from comprehensive. Nevertheless, I hope it will be useful to those readers who want 
to take their studies of particular subjects somewhat further. By consulting the references 
given here and following up the references which they themselves provide, it should be 
possible for the reader to acquaint himself with most of what is known, or at least with 
what is worth knowing, about a particular topic. 

The names of authors are not included in the Index (pages 504 ff.) unless they occur 
in the text, since the Bibliography itself is arranged alphabetically under authors’ names. 
The subjects, however, are indexed, and anyone wishing to consult references on a par- 
ticular topic should refer in the first place to the Index, which in turn will refer to the 
authors who have dealt with the matter in question. 

In general the Bibliography contains only references to theoretical papers ; applica- 
tions and illustrative material are included only when some theoretical point is involved. 
Papers which have been superseded by later work are omitted, except where they have 
a historical interest. 
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may be incompletely represented. Where possible, the references have been checked 
against the original publications, but here also I have had to rely on second-hand references 
in cases where the original papers were inaccessible. 

Note. — ^Names beginning with de, del, le,* St., van, von, etc., are entered under those 
titles, i.e. the order is strictly alphabetical. 

Abbbkbthy, J. R. (1933). On the elimination of systematic errors due to grouping. Ann. 
Math. Stats., 4, 263. 

Ackermann, W. Q. (1939). Eine Erweiterung des Poissonschen Grenzwertsatzes und ihre 
Anwendung auf die Risikoprobleme in der Sachversicherung. Schrift. math. Inst. Berlin, 
4, 211. 
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K. Pearson (1913d) 484, Yule (1900, 1912) 
502. 

Asymmetrical frequency-distributions, Bihl., Hans- 
maun (1934) 467. See also Gram-Charlier 
Series, Pearson Distributions. ^ 

Asymptotic distributions, Bibl., Hartman and 
others (1939) 468, Havfiand (1939) 468. 
See dlao Convergence in Probability. 

Attributes, significance in k samples, 119-20. 

, sub-sampling for, Bibl., Bartlett (1937a) 445. 

Autocorrelation, see Serial Correlation, Correlo- 
gram. 

function, 421-3. 

Autoregression equations, 399 ; (Table 30.4) 401 ; 
406-8 ; period of, 414-21. See also Serial 
Correlation, Correlogram. 

Average, accuracy of, Bibl.: Bowley (1912) 448, 
Keynes (1911) 473. See also Mean, Me^an, 
Mode. 

Balance, in design, 263-5. Bihl. : R. C. Bose 
(1939) 448, R. C. Bose and Nair (1939) 448, 
R. C. Bose (1942a) 448, Cox (1940) 453, 
K. R. Nair and Rao (1942) 479, Neyman and 
Pearson (1938d) 480, E. S. Pearson (19376, 
1938) 483, Student” (1938) 493, Weiss 
and Cox (1939) 498, Yates (1938a, 1940) 
502. 

Barbacki, S., N.R., 266. 

Barley yields, (Table 29.1, Figure 29.1) 364. 

Barnard, M. M., (Example 28.3) 345-8 ; N.R., 359. 
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Bartels, J., N,R., 437. 

Bartlett, M. S., distribution of t, 103 ; conditional 
tests, 127 ; k samples, 299, 323 ; stabilising 
variance, 207-8; Wishart’s distribution, 
333. Exercises from : (21.7) 139, (21.10) 
139, (21.11, 21.13, 21.14) 140, (27.2) 326, 
(28.2) 360, (28.12) 362. N.R., 45, 83, 94, 
136, 245, 304, 359, 437. 

Bayes* theorem and postulate, in estimation, 58- 9 ; 
in relation to fiducial inference, 90-1, 93-4. 
Bibl. : Bayes (1763) 446, Berkson (1930) 
446, Burnside (1924) 450, Molina (1931) 
478, E. S. Pearson (1925) 482. K. Pearson 
(1920a) 485, von Mises (1938) 497, Wishart 
(1927) 500. 

Beall, G., N,R„ 216. 

Behrens* test, 82, 91-4, 111-12. See Two Samples. 

Belonging coefficient, BibL, Kullback (1935c) 474. 

Bessel function distribution, (Exercise 28.2) 359- 
60. Bibl. : R. C. Bose (1938a) 448, S. S. 
Bose (1938a) 448, Fieller (1932a) 460, 
McKay (1932) 477, K. Pearson (1933a) 486, 
K. Pearson and others (1932a) 486. 

Best critical regions, 272, 275-8. 

Beta (measure of skewness and kurtosis), Bibl., 
McKay (1933) 477. 

Beta-function, Bibl., Muller (1931) 479, Thompson 
and others (1941a) 494. 

Beveridge, Sir William, (Table 30.1) 396, N.R., 
437. See Wheat-price Index. 

Bias, in estimation, 3"4 ; in statistical tests, 
307-27. Bibl. : Daly (1940) 454, Neyman 
and Pearson (1936, 1938) 480, Neyman 
(19366) 480, Yates (1935a) 602. 

Bimodal distributions, transformations of, Bibl., 
Baker (1930a) 444. 

Binomial, confidence intervals for, (Example 19.2) 
66-9 ; tables of, 81. 

, generally, Bibl. : Ayyangar (1934) 444, 

Camp (1924) 450, Clopper and Pearson 
(1934) 452, Cochran (1936a, 1937a, 19406) 
452, Fisher (19416) 462, Greenwood and 
Yule (1920) 466, Kullback (19356) 474, 
Lurquin (1937) 476, K. Pearson (19156) 
484, Romanovsky (1923) 489. 

Biological assays, Bibl., Irwin (19376) 470. 

Births, proportion of males in, (Example 21.8) 120. 

Biserial coefficients, Bibl. : Newbold (1926) 479, 
K. Pearson (1909, 1910) 484, (1917) 486, 
Soper (1914) 492. 

Bishop, D. J., N.R., 304, 369. 

Bivariate surfaces, Bibl. : Narumi (1923a) 479, 
Nicholson (1943) 481, Pretorius (1930) 487, 
Rhodes (1923, 1926) 488, Ritchie-Scott 
(1921) 489, Villars and Anderson (1943) 
496. 

Blocks, randomised, 213—14. Bibl. : R. C. Bose 
(1939) 448, R. C. Bose and Nair (1939) 448, 


R. C. Bose (1942a) 448, Cornish (1940a, 6, c) 
453, Cox (1940) 453, Fisher (19406, 1942a) 
462, Goulden (1937) 466, Kishen (1942) 

473, Nair and Rao (1942) 479, Nair (1943) 
479, Savur (1939) 490, Yates (19366, 1939a, 
1940) 502. 

Bose, C., N.R., 266. 

Bose, R. C., N.R., 359. 

Bowley, A. L., N.R., 266. 

Brady, J., N.R., 246. 

Brandt, A. E., (Example 24.1) 221-5, N.R., 245. 
Breeds of pig, (Example 24.1) 221-5, (Example 
24.2) 225, (Example 24.3) 226-7, (Example 

24.4) 229. 

Brookner, R. J., N.R., 304. 

Brown, G. W., bias in tests. 323, N.R., 304. 
Brown-Spearman formula, Bibl., Wherry (1935) 
499. 

Bruns, H., N.R., 437. 

Brunt, D., rainfall data, (Table 29.4) 367, N.R., 
437. 

Burr, I. W., distribution functions, 440. 
Buys-Ballot table, 430. 

Calculating machines, Bibl. : Comrie (1936) 452, 
Hoy (1938) 468, Mallock (1933) 477. 
Canonical correlations, 348-58. Bibl. : Bartlett 
(1941) 445, Hotelling (19366) 469, P. L. 
Hsu (1941a) 469. See Multivariate Analy- 
sis. 

Carleman criterion, 440. 

Cauchy population, estimation of location, 2, 
(Example 18.2) 51 ; median in, (Example 

17.4) 6 ; approximation to estimator for, 
(Example 17.11) 23-4 ; loss of information, 
(Example 17.16) 32. 

Cave, B. M., N.B., 394. 

Cement, specification of, Bibl., Wilsdon (1934) 
600. 

Central confidence intervals, 66. 

limit theorem, Bibl. : Bernstein (1927, 

1936) 446, Bochner (1936) 447, FeUer 
(19366, 1937) 460, Gnedenko (1938) 465, 
Liapounoff (1900, 1901) 476, Lindeberg 
(1922) 476, Madow (1939) 476, P61ya (1920) 
487. See Convergence in Probability. 
Centre of location, 41. 

Chains, in probability, see Markoff Process. 
Characteristic equation, Bibl., Horst (1935) 469, 
Samuelson (1942) 490. 

functions, Bibl. : Boas and Smithies 

(1937) 447, Dugu^ (1939) 458, Glivenko 
(1936) 465, Haviland (19346, 1935) 468, 
Kullback (1934, 19366) 474, Kunetz (1936) 

474, Wintner (1936) 600. 

Charlier’s series, see Gram-Charlier Series. 
Chi-squared (x*), minimum, 55-8 ; in testing 

goodness of fit, 106-7 ; in testing hypo- 
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theses, 299, 302 : generalisation in multi- 
variate analysis, see Wishart’s Distribution. 

Ohi-aquared, generally, Bibl, : Aroian (1943) 444, 
Berkson (1938) 446, Brownlee (1924a) 449, 
Camp (19385) 460, Cochran (1936a, 1942a) 
452, Doming (1934, 1938) 466, Eisenhart 
(1938) 459, El Shanawany (1936) 469, Fisher 
(1922a, 1928c, 1924d) 461, Fry (1938) 464, 
Griineberg and Haldane (1937) 466, Gumbel 
(19435) 466, Haldane (1937, 1938, 1939, 
1940) 467, Hoel (1938) 468, Irwin (19295) 
470, Jeffreys (19385, 19395) 471, Johnson 
and Welch (1939) 471, Koshal (1939) 474, 
Mann and Wald (1942) 477, Merrington 
(1941) 478, Neyman and Pearson (1931a) 
480, K. Pearson (1900c) 483, (1916c,/, 
1922a, 1923) 486, (19325) 486, Robinson 
(1933) 489, Seal (1940) 490, K. Smith (1916) 
492, Snedccor and Irwin (1933) 492, Su- 
khatme (1937a, 1938a) 494, C. M. Thompson 
(19415) 494, Wilson and Hilferty (1931a) 
600, Wilson and others (19315) 600, Yates 
(19345) 602, Yule (1922) 503. 

Clitic curve, 142. 

dopper, C. J., confidence limits for a binomial, 81. 

Closeness, in estimation, Bibl., Geary (1944) 464. 

Closure, Bibl., Stokloff (1914) 492. 

Cochran, W. G., on Fisher’s distribution, 117, 199 ; 
elimination of variates, 170, (Example 
22.10) 171 ; theorem on sum of squares, 
177-^8 ; N.R., 136, 216. 

Cograduation, Bibl., Gini (1939) 465, Salvemini 
(1939) 490. 

Combixiation of tests, 132-3. Bibl. : David ( 1934) 
466, E. S. Pearson (1938) 483, K. Pearson 
(19335) 486, WaUis (1942) 498. 

of observations, Bibl. : Bruen (1938) 449, 

Brunt (1931) 449, Mather (1935) 477. See 
Errors, general theory of. 

Compatible events, Bibl., Gumbel (19385) 466. 

Complete sufficiency, in estimation, 40. 

Complex experiments, Bibl., Yates (19365) 602. 
See Design. 

Composite hypothesis, 269, 282-3, 287-92, 316-17. 

Compound frequency-distributions, Bibl., Hel- 
guero (1906) 468, K. Pearson (19165) 484. 
See Bimodal. 

Concentration, Bibl. : Castellano (1933a, 5, 1937) 
461, Galvani (1932) 464, Gini (1932) 466, 
. Pietra (1932a) 486, von Schelling (1934) 
497, Wold (1936) 601. 

Concordance, Bibl., Gini (1916) 466. 

Concordant samples, 128. 

Conditionai statistics, (Exercise 21.10) 139 ; N.B., 
.46. Bibl., Bartlett (19385) 446. 

tests, 127-8, 134. 

Confidence, belt, 63 ; coefficient, 63 ; intervab, 
62^-84 ; for one parameter, 62-6 ; central 


and non-central, 66-9 ; for large samples, 
69-71 ; shortest sets, 71-4 ; sufficient 
estimators, 74-6 ; for several parameters, 
76-9, 81-2 ; studentisation in determining, 
79-81 ; tables of, 81 ; limits, 63. 

Bibl. ; Clopper and Pearson (1934) 462, 
David (1937, 1938a) 456,Fraxikel and Kull- 
back (1940) 463, Kolmogoroff (1941) 474, 
K. B. Nair (19405) 479, Neyman (19375, 
1941a) 480, E. S. Pearson (1932) 482, 
Peeurson and Sukhatme (19365) 482, Ricker 
(1937) 488, W. R. Thompson (1936) 494, 
Wald and Wolfowitz (19395) 497, (1941c) 
498, Wald (1942a) 498, Welch (1939a) 498, 
Wilks (19385, c) 499, (1939a) 500, Wilks 
and Daly (19395) 500. 

Configuration of sample, 127. 

Confliience anal 3 r 8 is, Bibl. : Cobb (1939) 462, 
Frisch (19345) 464, Mendershausen (1939) 
478, Reiersdl (1940, 1941) 488. 

Conformity, index of, Bibl., Solomon (1939) 492. 

Confounding, 262-3. Bibl. : Barnard (1936) 444, 
R. C. Bose and Kishen (1941) 448, Fisher 
(1942c) 462, K. R. Nair (19385, 1941) 479, 
Yates (1933a) 501. See Design. 

Consistence, of estimators, 3, 12-16. 

Contagious distributions, Bibl., Feller (1943) 460, 
Neyman (1939a) 480. 

Contingency, Bibl. : Bartlett (19355) 446, Blake- 
man and Pearson (1906) 447, Harris and 
Treloar (1927) 467, Hirschfeld (1936) 468, 
Kondo (1929) 474, K. Pearson and Blake- 
mon (1906) 484, K. Pearson (1900a, 5) 483, 
(1904) 484, (19165) 485, Stevens (1938a) 
493, Weida (1934) 498, Wilks (1936a) 499, 
Yates (19345) 502, Young and Pearson 
(1916) 602. 

Continuous spectrum, in periodogram, 433. 

Convergence in probability, Bibl. : Cantelli (1916, 
1917, 1923, 1933a, 1936) 460, Cramer (1934) 
464, Dodd (1926, 1927) 466, Doeblin (1938, 
1939) 467, Dugu4 (1937a) 468, Feller (1937) 
460, Fr^het (1930) 463, Jordan (1933) 472, 
Kolmogoroff (1937a) 473, Kozakiewicz 

(1937, 1938) 474, lAvy (19365, 1936c, 1939a) 
475, Messina (1933). 478, Romanovsky 
(19325) 489, Slutzky (1926, 1937a) 491. 
See also Central Limit Theorem. 

Convolutions, Bibl., van Kampen (1937a) 496, van 
Kampen and Wintner (19375, c) 496. 

Cornish, E. A., on Fisher’s distribution, 116, 
N.S., 136. 

Corrections, for grouping see Grouping Correc- 
tions ; to correlations, BihL, Roff (1937) 489. 

Correlated observations, sampling frenn, Bibl. : 
A. T. Craig (19335) 463, C. C. Craig (1931a) 
463, (1932) 464, Rhodes (1927) 488. See 
also Time-series. 
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Correlation, confidence intervals for coefficient, 
81 ; Pitman’s test for, 131-2 ; significance 
of, 236. 

BihL : Baker (19306) 444, Bilham (1926) 
^47, Bispham (1920, 1923) 447, Bonferroni 
(1939) 447, Brander (1933) 449, W. Brown 
(1909) 449, Brownlro (1910, 1925) 449, 
Cheshire and others (1932) 451, Cochran 
(1937a) 462, Coleman (1932) 452, Cowles 
and Chapman (1936) 463, Day and Fisher 
(1937) 456, David (1937, 1938) 456, O. R. 
Davies (1930) 466, de Lury (1938) 466, 
Doming (1937) 456, Dieiilefait (1934a, 
1935a) 466, S. C. Dodd (1937) 457, Dunlap 
(1931) 468, Eell8<1929) 459, Ezekiel (1930a) 
459, Fischer (1933a, 6) 460, Fisher (1916, 
1918, 1921c, 1924a) 461, Fr^chot (1933) 
463, Frisch (1929) 463, Frisch and Mudgett 
(1931) 463, Garwood (1933) 464. Geary 
(1927) 464, Gehlke and Biohl (1934) 464, 
Geiringer (1933) 464, J<^ffreys (1939c) 471, 
Khintchine (1928) 473, Kuzmin (1939) 474, 
Lindblad (1937) 476, Merzrath (1933) 478, 
A. N. K. Nair (1942) 479, Newbold (1926) 
479, E. S. Pearson (1923, 1924, 1931a, 1932) 

482, K. Pearson (18976, 1900^^,6, 1902a) 

483, (1904, 1905, 1907a, 1909, 1910, 1913a. 6, 

1914, 1921) 484, (19206, 19256) 486, Pitman 
(1939c) 486, Prokopovic (1935) 487, Quenscl 
(1938) 487, Rider (1932) 488, Romanovsky 
(1926a) 489, Soper (1913, 1914, 1917) 492, 
Stoffcnsen (1934) 492, Stouffer- (1934, 

1936a, 6) 493, “Student” (19086) 493, 
Thorndike (1937) 494, Thouless (1939) 496, 
Tschiiprow (1925, 1928) 495, (1934) 496, 
Wicksell (1917a. 6, 1921, 1933) 499, Yasu- 
kawa (1925) 501, Yule (1897a. 6, 1906, 
1907, 1910) 602. 

See aim Multiple Correlation, Regression. 

ratio. BihL : Hotelling (1925) 469, Isserlis 

(1914, 1916) 470, Kelley (1935) 472, Mussel- 
man (1926) 479, E. S. Poai-son (1927) 482, 
K. Pearson (1906, 1910, 1911a, 6, 1916a) 

484, (1917, 19236) 486, “Student” (1913) 
493, Wallis (1939) 498, Wishart (1932a) 600. 

Correlogram, 404-12 ; significance of, 412—13 ; of 
general linear series, 42fi-l ; relation with 
periodogram, 432-3. 

Cost of living, Bibl. : Bennett (1920) 446, Bowley 
(1919) 448, Konds (1939) 474. 

Cotton yam, BihL, Tippett (1936) 496. 

Counting experiments, BihL, Peierls (1935) 486, 
Tippett (1932) 496. 

Coutts, J. R. H., data from, (Table 22.1) 150. 

Covariance, analysis of, 237—46. Bibl, : Bailey 
(1931) 444, Bartlett (1936d, 1936c) 446, 
Brady (1936) 449, Cochran (1934) 462, 
Cornish (1940c) 463, Cox and Snedecor 
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(1936) 463, Hirschfeld (1937) 468, K. R. 
Nair (1940a) 479, Snedecor (1936) 492, Wilks 
(1936) 499, (1938c) 600, Wishart (1936) 601. 

Covariance, distribution of, (Example 28.1) 334-6. 

Cram6r, H., cc^-test, 108-9 ; Carleman criterion, 
440. 

, Critical region, 270, (Example 27.2) 312-13. 

Crop estimation, BihL, Yates (1936c) 502. 

Crum, W. L., NM., 437. 

Cumulants, BihL : Ayyangar (1938) 444, Cornish 
and Fisher (1937) 453, C. C. Craig (1931c) 
464, Drossel (1940, 1941) 458, Frisch (1926) 
463, Gotaas (1936) 465, Thiele (1931) 494. 
See also Ir-statistics, Moments. 

Curtiss, J. H., JV./?., 216. 

Curve fitting, BihL : Elderton and Hansmami 
(1934) 459, Fisher (1912) 461, Jones (1937a) 
472, Kerrich (1935) 473, Koshal (1933, 
1936, 1939) 474, Myers (1934) 479, Nair 
and Shrivastava (1942) 479, Nair and 
Banerji o (1943) 479, K. Pearson ( 1901c) 483, 
Rhodes (1930) 488, Roos (1937) 489, 
K. Smith (1916) 492, Snow (1911) 492, 
Wald (1940a) 497. See also Least Squares, 
Regression, Trend. 

Curvilinear regression, 146-74. BihL^ Menders- 
hausen (1937a) 477, T. V. Moore (1937) 
478 ; and see Regression. 

Cycle, 397-8. See Periodicity. 

Cyclical effects, tests for, 124-7, 370. See 

Periodicity. 

D>-statistic, 369. BihL : Bhattacharya and 
Narayan (1942) 446, R. C. Bose (1936a, 6) 

447, R. C. Bose and Roy (1938c, 1940) 

448, S. N. Bose (1935, 1937) 448, Roy 

(1939a) 489. See also Discriminatory 

Analysis, Multivariate Analysis. 

Daly, J. F., on shortest confidence intervals, 82 ; 
on bias in tests, 323 ; N,B,, 304. 

Daniels, H. E., (Example 23.2) 183-5 ; rank 
correlations, 441. 

Dantzig, G. B., N.R., 304. 

David, F. N., confidence intervals for correlations, 
81 ; N,R., 304. 

Davis, H. T., time-series, 433, 434 ; N,R„ 394, 
437. 

Day, E. E., N.R., 246. 

Death rates, Bibl., Farr (1919, 1920) 460, Pearson 
and Tocher (1916c) 486. 

Decomposition of series, BihL, Anders^ (1927) 
443, Smirnoff (1936) 491. See also Time- 
series. 

Decreasing functions, BihL, C. D. Smith (1939) 
491. 

Degrees of freedom, of “ Student’s ” t, 102 ; of 
hypotheses, 270. 

De Lury, D., N.R., 137. 
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Denumerable probabilities, Bibh^ Steinhaus (1923) 
492. 

Dependence, see Independence, Correlation. 

Derkson, J. B. D., on stochastic convergence, 440. 

Design, of sampling inquiries, 247-68 ; pre- 
liminary points, 248-9 ; stratified sampling, 
240-52 ; design of experiments, 252-4 ; 
orthogonality, 254 ; replication, 255 ; 
randomisation, 256-6 ; sensitivity of a 
test, 256-7 ; Latin squares, 257-62 ; con- 
founding, 262 ; design and randomisation, 
263-6. 

*Bihl, : Bhattaoharya (1943) 446, Chris- 
tidis (1931) 451, Fishe^(19d5c) 462, Jeffreys 
(1939e) 471, ** Student ” (1938) 493, Wold 
(1943) 498, Yates (1939e) 602. See aUo 
Blocks, Factorial Experiments, Latin 
Squares, etc. 

Determinantal equations, Bibh, Girshik (1939) 
466. See alao Matrix. 

Deviance, footnote, 178. 

Difference, of two means, test of (equal variances) 
109-11 ; (unequal variances) 111-14. See 
alao Behrens’ Test, Two Samples. 

, of two variances, 116-16. 

, equations, BibL, Frisch (1932) 463, 

Marples (1932) 477. See alao Auto- 

regression Equations. 

Differences of variates, BtbL, Irwin (1937a) 470. 

Dilution method, BibL, R. D. Gordon (1939) 465, 
Matuzewski and others (1935) 477. 

Dirichlet integrals, 298. 

Discontinuous variates, BibL : dell’ Agnola (1937) 
456; Guldberg (1934) 466, Muench (1938) 
478, H. W. Norton (1937) 481, Ottestad 
(1937, 1938) 481. 

Discordant samples, 128. 

Discriminatory analysis, discriminant function, 
341-8. BibL: Barnard (1936) 444, Bartlett 
(1939c) 446, Dwyer (1942) 468, Fisher 
(1936a, 1938c, 19395, 1940d) 462, P. L. Hsu 
(19395, 1941a, 1941c) 469, H. F. Smith 
(1936) 492, Travers (1939) 496, Wallace 
and Travers (1938) 498, Welch (19395) 498, 
Wilks (1938d) 500. See alao Multivariate 
Analysis. 

Dispersion, BibL, Norris ( 1938) 481 . See Variance, 
etc. 

matrix, 330, 341, JS.B., 368. 

Dissection of frequency-distributions, BibL, Burrau 
(V34) 450. 

Distributed lags, aee Lags. 

Distributions, generally, BibL : Ambarzumian 
(1937) 443, Baten (1933a) 445, (1934) 446, 
Bispham (1922) 447, Bochner and Jessen 
(1934) 447, Bochner (1937) 447, Bowley 
(1933) 448, Burr (1942) 450, Camp (1937) 
460, Cannon and Wintner (1935) 450, 


Chapelin (1932) 451, Cram4r and Wold 
(1936) 454, Edgett (1931) 458, Eyraud 
(1938a) 459, Glivenko (1933) 465, Guldberg 
(1935) 466, Hansmann( 1934) 467, Hartman 
and others (1937) 467, (1939) 468, Haviland 
(1934a, 5, 1935, 1939) 468, R. Henderson 
(1907) 468, Jessen and Wintner (1935) 
471, Khintchine (1937a) 473, Kullback 
(19365) 474, Mazzoni (1934) 477, K. Pearson 
(1923c, 1924a) 485, R. Schmidt (1934) 490, 
von Mises (1939a) 497. 

Dodd, E. L., period generated by moving average, 
384, N.R., 394. 

Doob, J., N,R., 45. 

Dosage-mortality, BibL, Garwood (1941) 464. 

response, BibL, Irwin and Cheeseman (1939) 

470. 

Dugu6, D., N.R,, 45. 

Duration of play, BibL, de Finctti (19395) 456, 
Fiellor (1931a) 460. 

Eden, T., on Fisher’s distribution, 206, (Example 
23.8) 214, N.R., 216. 

Edgeworth, F. Y., N.R., 46. 

Edwards, J., Integral Calculua, footnotes, 44 and 
60. 

Efficiency, of estimators, 6-7 ; of maximum 
likelihood estimators, 18-19 ; of moments 
in fitting Pearson curves, 43-4 ; of sampling, 
BibL, Yates and Zacopanay (1935c) 602. 

Egg-production, in laying hens, (Table 29.5, 
Figure 29.5) 368. 

Egyptian skulls, (Example 28.3) 345-8. 

Elasticity of demand, BibL, Mosak (1939) 478, 
Schultz (1933) 490. 

Elderton, E. M., (Example 21.14) 133, N,R., 266. 

Elderton, Sii* William P., N.R., 45. 

Electric lamps, testing of, (Example 23.1) 179-80. 

Elimination of variates, in regression analysis, 
167-70. 

Emuneration in sampling, BibL, Cochran (19395) 
452. 

Equidetectability, curves of, 318. 

Equimodal distributions, BibL, Mouzon (1930) 478. 

Error, in varianoe-analyBis, 187. 

Errors, of first and second kind, 270, (Exercise 
26.6) 305. 

, general theory of, BibL : Qrelot (1936, 

1937) 449, Campbell (1935) 450, Cram5r 
(1928) 454, Doming and Birge (1934) 456, 
Edgeworth (1905, 1906) 458, Jeffreys (1933, 
1937c, 1938d, 1939d) 471, Mahalanobis 
(1922) 476, Wertheimer (1932) 499. See 
alao Least Squares. 

Estimation, generally, 1-49, 50-62 ; in anal 3 ^ 
of variance, 181, 218-19. 

Estimator, definition, 2 ; consistence ;Qf, 8 ; bias 
of, 3-4,; efficiency of, ^10 ; sufficiency of. 
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7-12 ; approximation to, 22-4 ; most 
general sufficient form, 24—6 ; accuracy of, 
28-9 ; ancillary, 32-3 ; in multivariate 
case, 33-42 ; location and scale, 40-2 ; 
by minimiun variance, 60-6 ; by minimum 
66-8 ; by inverse probability, 58-9 ; 
by least squares, 69-60. See also Maxi- 
mum Likelihood, Minimum Variance. 

BM. : Aitken and Silverstone (1942) 
443. Beall (1939) 446, . S. S. Bose and 
Mahalanobis (19386) 448, Darmois (1936, 
1936) 466, O. L. Davies and Pearson (1934) 
466, Doob (1936) 467, Dugu6 (1936a, 6, 
19376) 468, Fisher (19256) 461, (1934d, 
19386, d) 462, Geary (1942, 1944) 464, 
Halphen (1939) 467, Neyman (19376) 480, 
E. S. Pearson (1937a, 1939) 483, Pitman 
(19376, 1939a) 486, Wald (1939a) 497. 
Expectation of life, see Life. 

Expected values, see Mean Values. 

case, in sociological dat a, Bibl., Stouf|er and 

Tibbits (1933) 493. 

Expenditure of families, (Example 23.9) 214-15. 
Exponential distribution, (Exercise 26.8) 306-6. 
BibL, Paulson (1941) 482, Sukhatme (19366) 
493. 

Extra-sensory perception, BibL, Greenwood and 
Stuart (1937) 466, Stevens (19396) 493. 
Extremes, distribution of, BibL : Daniels (1941) 
466, de Finetti (1932) 455, Dodd (1923) 

466, Fisher and Tippett (1928a) 461, 
Gumbel (1934, 1935a) 466, McKay (1935) 
477, Olds (1936) 481, Tippett (1926) 496. 
See also wth Values. 

F-distribution (variance ratio), BibL, Merrington 
and Thompson . (1943) 478. See Fisher’s 
Distribution. 

Factor analysis (psychology), BibL : Bartlett 
(1937c) 446, W. Brown (1936) 449, Burt 
(1937a, 6, 1938a, 6) 460, Camp (1932, 1934) 
460, Darmois (1934) 456, Emmett (1936) 

469, Hoel (1937, 1939) 468, Irwin (1933) 

470, Ledermann (1938) 475, Roff (1937) 
489, Thomson (1916, 19196, 1939) 494, 
Thurstone (1935, 1938) 496. 

Factorial experiments, 199-202. BibL : Barnard 
(1936) 444, R. C. Bose and Kishen (1941) 
448, Cornish (1936, 19406, c) 463, Goulden 
(1937, 1938) 465, P. L. Hsu (1943) 470, 
Kishen (1940) 473, Wishart (1938) 601, 
Yates (19376) 502. 

moments, BibL, Gonin (1936) 466, Ottestad 

(1939) 481. 

sums, in fitting regressions, (Example 22.8) 

lfi4-6. 

Factorisation of variables, BibL, S. C. Dodd (1927) 

467 . 


Families of alternatives, 276-6. 

Feller, W., N,B,, 303. 

Fiducial inference, 86-96. BibL : Bartlett (1939a) 
445, Fisher (1933, 1936a, 19356, 1936c, 
19376, 1939a, 1940c, 1941a) 462 ; Garwood 
(1936) 464, Ricker (1937) 488, Segal (1938) 
491, Wilks (19386, c) 499, (1939a, 6) 500. 
See Confidence intervals. 

Field experiments, BibL, Wishart and Saunders 
(1935) 501. See Design. 

Fifteen-constant surface, BibL, K. Pearson (1925a) 
485. 

Filon, L. N. G., N.R„ 46. 

Finite populations, sampling from, BibL : Church 
(1926) 452, Hansen and Hurwitz (1940) 

467, Irwin and Kendall (1944) 470, Isserlis 
(1918c, 1931) 470, Neyman (1925) 480, 
O’Toole (1934) 481, Sukhatme (1944) 494, 
Tschuprow (19186, 1921, 1923) 496. 

Finney, D. J., 2 -test, 199 ; test of significance in 
periodogram analysis, 434 ; N.R„ 137, 216. 

Fisher, R. A., lioting by moments, 43 ; fiducial 
probability, 90 ; tables for Behrens* test, 
92, 93, 111; expansion of ** Student’s ” 
integral, 101 ; tables of t, 102 ; difference 
of two means, 110; z-distribution, 116, 
117 ; configuration of a sample, 127 ; 
fitting regressions, 165 ; theorem on sum 
of squares, 176-7 ; design of experiments, 
263 ; discriminatory analysis (Example 
28.2) 342-4 ; distribution of canonical 
correlations, 367 ; significance of a periodo- 
gram, 434 ; N.R., 45, 61, 83, 94, 136, 173, 
216, 245, 266, 359. 

Exercises from : (Exercise 17.1) 46, 
(Exercises 17.4, 17.6, 17.6) 46, (Exercise 
17.12, 17.16, 17.16) 48, (Exercise 17.19) 49, 
(Exercise 18.3) 61, (Exercises 20.1, 20.2) 
94-5. 

Fisher’s distribution (z-distribution), properties of, 
116-18.; in variance analysis, 179, 199; 
in non-normal case, 206-6, 234-6, (Example 
26.8) 289-91 ; in linear hypothesis, 301 ; 
in discriminatory analysis, 345. 

BibL : Aroian (1941) 444, R. A. Chap- 
man (1938) 461, Cochran (1940a) 462, 
Daniels (1938a) 454, Eden and Yates (1933) 

468, Fisher (1924c) 461, P. L. Hsu (1941c) 

469, Lawley (1938) 475, McCarthy (1939) 
477, Paulson (1942) 482, Welch (1937) 498. 

Fitting, see Curve Fitting, Least Squares. 

Flood flows, BibL, Gumbel (1938a, 1941) 466. 

Fluctuations in time-series, BibL, R. A. Gordon 
(1937) 466. See Time-series. 

Forecasting, BibL: Cowles (1933) 463, Cowles emd 
Jones (1937) 463, de Finetti (1937) 466, 
Schultz (1930) 490, Yates (1936c) 602. 

Fors 3 rth, A. R., Calcvlus of VaricUions, footnote, 60. 
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Fourier analysis, see Harmonic Analysis, Period- 
icity. 

Fragmentary samples, Wilks (1932a) 499. 

Frankel, L. R., N.R,, 136, 266. 

Freedom, degrees of, see Degrees of Freedom. 

Frequency-distributions, see Distributions. 

Frequency theory of probability, Bibl, : Campbell 
(1939) 460, Oantelli (1923, 1932, 19336) 450, 
(1936) 451, Ddrge (1934, 1936) 458, von 
Mises (1931) 497 . See Probability, Random 
Sequence. 

Friedman, M., (Example 23.9) 214-15. 

Frisch, R., N,R», 368. 

Galton's problem, BihL : Gal ton (1902) 464, Irwin 
(1925a) 470, K. Pearson (1902c) 484. 6'ee 
Rank Correlation. 

Gamma distribution, BihL, Kibble (1941) 473. 
See Type III. 

Garwood, F., confidence intervals for Poisson dis- 
tribution, 81. 

Gauss, K. F., variance of residuals, 60~1 ; stan- 
dard errors, 163 ; N»R., 46. 

Gaussian distribution, see Normal Population. 

Geary, R. C., distribution of t, 102-4 ; test of 
normality, 106 ; theorem on independence, 
118; (Exercises 21.1, 21.2) 137-8; N.R., 
46, 136. 

Geary’s ratio, BihL, Geary ( 1936a, 6, 1936a) 464, 
Tricomi (1937) 495. 

General factor (intelligence), see Factor Analysis. 

Generalised distance, of Mahalanobis, N,R,, 369. 

Generating functions, BihL, Aitken (1931) 442. 
See Characteristic Fimctions. 

Geometric Mean, Bihl., Camp (1938a) 460, Norris 
(1938, 1940) 481. 

Giermination of wheat-seeds, (Example 23.7) 207-9. 

Gini’s mean difference, 108. 

Girshik, M. R., (Exercise 28.11) 362, N,R., 369. 

Glass, seed in, (Example 23.6) 202-4. 

Goodness of fit, tests of, 106-9. BibL : David 
(1939) 455, Neyman (1937a) 480, K. Pear- 
son (1934) 486, Thomson (1919a) 494. See 
Chi-squared. 

Gosset, W. S. (** Student ”), 80, 266, N,R., 394. 

Gould, C. E., (Example 23.6) 202-4. 

Goulden, C. H., N.B., 216, 266. 

Grades, see Rank Correlation, Galton’s Problem. 

Graduation, BihL, Aitken (1933a, 6, c) 442, Key- 
fitz (1938) 473. See Interpolation, Least 
Squares, Orthogonal Polynomials, Trend. 

Graeco-Latin square, 261-2. Bihl.t R. C. Bose 
(19386) 448. 

Gram-Charlier series, estimation in (Exercise 18.1) 
61 ; for non-normal t, 103 ; goodness of 
fit in, 109 ; in s-distribation, 116. BihL : 
Aitken and Oppenheim (1931) 442, Aitken 
(1982) 442, Aroian (1937) 444, Baker 


(1930d, 1936) 444, CharUer (1906, 1912, 
1928, 1931) 461, Cornish and Fisher (1937) 
463, C. C. Craig (19316) 464, Cram4r (1926, 
19366) 454, Doetsch (1934) 467, Edgeworth 
(1906) 458, Gram (1879) 466. Hildebrandt 
(1931) 468, Jacob (1933, 1935, 1937) 471, 
Meisener (1938) 477, Quensel (1938) 487, 
Samuelson (1943) 490, Schmidt (1934) 490, 
Steffensen (1930) 492, Wicksell (19176, 
1934a) 499. 

Greenstein, B., N.R., 437. 

Grouping corrections, BibL : Abemethy (1933) 

442, Alter (1939) 443, Baten (1931) 446, 
Bliimel (1939) 447, Biurkhardt and Stackel- 
berg (1939) 449, Carver (1933, 1936) 461, 
C. C. Craig ( 1936c, 19416) ' 4.54, Elderton 
(1933, 19386) 469, Kendall (1938a) 472, 
I^wis (1935) 475, Sandon (1924) 490. 

, effect on correlations, BibL, Gehlke and 

Biehl (1934) 464. 

, significance of, BibL, Stevens (19376) 493. 

Groups of experiments, BibL, Yates and Cochran 
(19386) 602. 

Hampton, W. M., (Example 23.6) 202-4. 

Hansmann, G. H., NM,, 46. 

Harmonic analysis, BibL : T. F. Anderson (1935) 

443, Brunt (1928) 449, Carslaw (1930) 461, 
Fisher (1929a) 461, (1940a) 462, Frisch 
(1928, 1931, 1933) 463, Poliak (1926) 487, 
Turner (1913) 496, Wiener (1930) 499. 
See Periodicity. 

mean, BibL, Norris (1939) 481. See Mean 

Values. 

Hartley, H. O., on z-tost, 199 ; k samples, 299 ; 
N.B., 137, 216, 304. 

Heads and tails, Bibl., Fieller (1931c) 460. See 
Duration of Play. 

Hendricks, W. A., (Exercise 21.9) 139 ; N.R., 136. 

Hermite polynomials, see Tchebycheff-Hormite 
Polynomials. 

Heterogeneous populations, Bihl., Baker (1930c, 
1932) 444. See also Lexis Theory, Strati- 
fied Sampling. 

Hierarchies in correlation, Bihl., Thomson (1916, 
19196, 1935) 494, Wilson (1928) 500. See 
Factor Analysis. 

Higham, J. A., (Exercise 29.7) 396. 

Highest audible pitch, (Example 22.4) 162-3, 
(Example 22.5) 155-6. 

Hirschfeld, H. O., see Hartley, H. O. 

Homogeneity, Bihl. : Baker (1941) 444, Hartley 
(1940) 467, Welch (1038a) 498. See k 
samples. 

Horse population and wheat prioes, 436. 

Hotelling, H., canonical correlations, 348-58; 
(Exercises 28.7-28.10) 360-2; N.B., 45, 
136, 359. 
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Hotelling’s 2’, 323, 336-8; 369. BibL, 

Hotelling (1931) 469, P. L. Hsu (1938c) 469. 

Hsu, P. L., linear hypothesis, 301 ; Wishart’s 
distribution, 333 ; canonical correlations, 
367 ; N.R,, 304. 359. 

Hypergeometric series, BibL : Ayyangar (1934) 
444, Camp (1926a) 450, O. L. Davies (1933, 
1934) 465. Gonin (1936) 465, K. Pearson 
(18996) 483, (19246, c) 485, Rornanovsky 
(19266) 489. 

Hypotheses, testing of, see Statistical Hypotheses. 

Imaginary random variables, BibL, Eyraud (19386) 
469. 

Immunity, BibL, Brownlee (1905) 449. 

Incomes, distribution of, BibL, Cantelli (1929) 
460, Darmois (1933) 466. 

Incomplete blocks, see Blocks. 

Independence, of quadratic forms, Bibl, : (vochraii 
(1934) 462, A. T. Craig (1936a, 1943) 453, 
Madow (1940) 476. 

, statistical, BibL : del Vecchio (1933) 466, 

Kac and van Kampen (1939) 472, Marcin- 
kiewicz and Zygmund (1937) 477, Tschu- 
prow (1934) 496. See also Correlation, 
Contingency, etc. 

Index, distribution of, see Ratio. 

numbers, BibL : Bowloy (1926) 448, Clare- 
mont (1916) 462, Crowther (1934) 464, 
Dodd (1937c) 457, Edgeworth (1925o, 6,c) 
469, I. Fisher (1922) 460, Flux (1921, 1933) 
463, Frickey (1937) 463, Frisch (1930) 463, 
Haberler (1927) 467, Konds (1939) 474, 
Persons (1928) 486, Rhodes (1936) 488, 
Schultz (1939) 490, Yates (1939c) 602. 

Indices, correlation of, BibL : Baker (1937) 444, 
J. W. Brown and others (1914) 449, Clare- 
mont (1916) 462. 

Industrial accidents, Bt61., Newbold (1927) 479. 

processes, see Quality Control. 

Inequalities, BibL : Mortara (1934) 478, Nanimi 
(19236) 479, Norris (1935, 1937) 481, 

Rornanovsky (1938) 489, Shohat (1929) 
491, C. D. Smith (1930) 491, von Mises 
(19396) 497, Wald (1938) 497. 

Infantile mortality, BibL, Feld (1924) 460. 

Infection in potatoes, (Example 24.6) 230-2, 
(Excunple 24.6) 232-3. 

Inference, see Statistical Hypotheses. 

Information, amount of, 29-30 ; loss of, 30-2 ; 
in minimum 67—8. BibL : Bartlett 
(1936a, 6) 446, Fisher (19346, 1936a) 462. 

Intensity, of a periodogram, 426. 

Interaction, in variance-anal 3 rsis, 187, 188-9. 

Interference, analysis of, BibL, Stevens (1936) 493. 

Interpolation, BibL : Comrie (1936) 462, ErdOs 
and Turan (1937, 1938) 469, Feldheim 
(1936a) 460, Fisher and Wishort (1927) 


461, Gini (1921) 466. Lidstono (1937) 476, 
Piotra (19326) 486, Salvemini (1934) 490, 
Simaika (1942) 491, Tchebycheff (1907) 
494. See also Graduation, Least Squares, 
Orthogonal Polynomials. 

Intra-class correlation, 181, BibL Harris (1914) 
467, Harris and Gunstad (1931) 467. 

Intrinsic accuracy, in estimation, 28-9. 

Invariants of frequency curves, BibL, Zoch 
(1934) 603. 

Inverse probability, in estimation, 68-9 ; relation- 
ship with fiducial infet'ence, 90~1, 93-4. 
BibL : Bayes (1763) 446, Fisher (1926c, 
1030a) 461, (1932, 1936a) 462, Isserlis (1936; 
471, Jeffreys (19376) 471, Tomior (1937) 
496, Wisniewski (19376) 601. 

Iris (flower), (Example 28.2) 342-4. 

Irregular Kollektiv, 123. See Random Sequence. 

Irwin, J. O., (Exe>rcise 23.1) 216-17 ; sampling 
moments, 440 ; NJi., 216. 

Item analysis, BibL, Merril (1937) 478. 

Iterations, see Ruos. 

J-shaped distributions, BibL, Elderton (1933) 
459, Solomon (1939) 492. 

tfackson, W. R., N.i?., 304. 

Jeffreys, H., (Example 18.6) 56-7 ; fiducial 
inference, 90-1, 93- 4 ; N,R,, 61, 94, 266. 

Jensen, A., N.R., 266. 

«1oint sufficiency, 39. 

Judgments, validity of, BibL, Eysenck (1939) 469. 

k samples, problem of, 119-22, 205-~9 ; bias in, 
323, (Exorcise 27.2) 326. BibL : Bartlett 
(1934a) 446, Bishop (1939) 447, Bishop and 
Nair (1039) 447, R. C. Bose and Roy (1940) 
448, G. W. Brown (1939) 449, Neyman 
and Pearson (19316) 480, Pearson and 
Wilks (19336) 482, Sukliatme (19366) 493, 
(10376) 494, Welch (1936) 498, Wilks 
(19356) 499. See L-tests. 

A;-statistics, BibL : Fisher (19296) 461, Fisher and 
Wishart (1931) 462, C. T. Hsu and Lawley 
(1939) 469, Kendall (1940) 472, (19426) 473, 
Wishart (1929a, 6, 1930, 19336) 500. See 
also Moments, sampling. 

Kelley, T. L., (Example 28.4) 351-2. 

Kermack, W. O., NM., 136. 

Ke 3 mes, Lord, (Exercise 17.7) 47. 

Kolmogoroff, A., confidence intervals for ter- 
minals, 83. 

Kolodzieczyk, St., linear hypothesis, 293 ; N.JR., 
304. 

Koopman, B. O., (Exercises 17.13, 17.14) 48, 
N.R., 46. 

Koshal, R., N,R., 46. 

Kronecker delta, 329. 
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Kurtic curve, 142. 

Kurtosis; Bibh, Frisch (1934a) 464. 


L-tests, BiJbL : Mahalatiobis (1933) 476, Mood 
(1939) 478, Nayer (1936) 479, Paulson 
(1941) 482, Welch (1936a) 498, Wilks and 
Thompson (1937a) 499. See k samples. 

Lag correlation, 435-6. 

Lags, distribute, Bihl. : Alt (1942) 443, Koop- 
mans (1941) 474, K. R. Nair (1936) 479, 
Zrzavy (1933) 503. 

Lanarkshire milk investigation, 266. 

Large numbers, law of, see Convergence in Proba- 
bility. 

Largest member of a sample, see Extremes. 

of a set of variances, see Variance ratio. 

Latent roots of a matrix, see Matrix. 

Latin squares, 257-62, 266. Bihl. ;) R. 0. Bose 
(19385) 448, R. C. Bose and Nair (19425) 
448, Euler (1782) 459, Fisher yand Yates 
(1934c) 462, Fisher (1942d, e) 462, Mann 
(1943) 477, H. Norton (1939) 481, Stevens 
(19385) 493, Welch (1937) 498, Yates (1933c) 
501, (1936a) 502. 

Lattices, distributions on, van Kampen and 
Wintner (19395) 496. 

Lawley, D. N., jy.i2., 359. 

Least squares, in estimation, 59 ; in regression 
analysis, 145 ; in time-series, 371. Bihl. : 
Adcock (1878) 442, Aitken (1933a, 5, c, 
1935a) 442-3, Davis (1933) 455, David and 
Neyman (1938c) 455, Doming (1931, 1934, 
1935, 1937) 456, Hendricks (1931, 1934) 
468, E. Johnson (1940) 471, Jones (1937a) 

472, Jordan (1932, 1934) 472, Kerrich (1937) 

473, Shoffer (1935) 491, Sheppard (1914, 
1929) 491, Sterne (1934) 493, Wisniewski 
(1937a) 501, Wong (1935) 501. 

Lexis, W., ratio, 119 ; N.R.^ 216. 

theory^ Bihl. : Geiringer (1942) 465, Rider 

(1934) 488, Tschuprow (1918, 1919a) 495, 
von Bortkiewicz (1931) 497. 

Life, expectation of, etc., Bihl. : Brownlee and 
Morison (1911) 449, Dublin and others 
(1935) 458, Greenwood (1922) 466, Gumbel 
(1924, 1925, 1932) 466, Seal (1940) 490, 
Wilson (1938) 500. 

Likelihood, in estimation, see Maximum Likeli- 
hood ; in testing hypotheses, 277-80, 295- 
302, 32a-0. Bihl., Fisher (1932, 1934a, 5) 
462, Wilks (1936a) 499. 

Likelihood-ratio tests, Bihl. : DcJy (1940) 464, 
Ne 3 mian and Pearson (1933c) 480, Wilks 
(1938a) 499, Wilks and Thompson (1937a) 
499. See L-tests. 

Limiting form of significance tests, 322. Bihl., 
Peiser (1943) 486. 


Linear equations -subject to error, BiU., Lonseth 
(1942) 476. 

hypotheses, 292-5, 300-2. Bihl,, Johnson 

and Neyman (1936) 472, Kolodzieczyk 
(1935) 474. 

Linearity of regression, see Regression. 

Linkage, Bihl., Finney (1940, 1941, 1942) 460, 
N. L. Johnson (19405) 472. 

Link-relatives, Bihl., Robb (1930) 489. See Index 
Numbers. 

Live births, proportion of males among, (Example 

21 . 8 ) 120 . 

Location, estimation of parameters of, 40-2 ; 
centre of, 41 ; Pitman’s tests of, 323-6. 
Bihl., Pitman (1039a, 5) 486. 

Logarithmic variate, Bihl. : Finney (19415) 460, 
Jenkins (1032) 471, Nydell (1919) 481, 
Pae-Tsi-Yuan (1933) 481, Quensel (1936) 
487, WickseU (1917a) 499, WilHams (1937) 
600. 

Loss of information, in estimation, 30-2. 

weight in soil, (Example 22.3) 140-52, 

(Example 22.6) 158. 

m rankings, problem of, (Example 23.9) 214-15. 
Bihl., Friedman (1937, 1940) 463, Kendall 
and Babington Smith (19395) 472. 

Macaulay, F. R., (Exercise 29.4) 395 ; N.R., 394. 

MacStewart, W., N.R., 304. 

Madow, W. G., N.R., 359. 

Magnetic declination, Bihl., Schuster (1899) 490. 

Magnitude, remdom division of, Bihl., Fisher 
(1040a) 462, Stevens (1939a) 493. 

Mahalanobis, P. C., N.R., 303, 304, 359. 

Males, proportion in births, (Example 21.8) 120 ; 
marriages of, (Example 21.9) 121-2. 

Markoff, A. A., theorem on least squares, (Exercise 
26.5) 267. 

process (Markoff chains), Bihl. : Doeblin 

(1936, 1937) 457, Elfving (1937, 1938) 459, 
Feldheim (19365) 460, Fortet (1935-8) 463, 
Fr4chet (1935, 19365, 1937a) 463, Geiringer 
(1938) 464, Hadamard and Fr4chet (1933) 
467, Hostinsky (1937) 469, Kolmogoroff 
(19375) 473, L6vy (19355, 1936c) 475, 
Markoff (1912) 477, Mihoc (1934) 478, 
Onicescu and Mihoc (1935-9) 481, Roman- 
ovsky (1936a) 489, S5ukarev (1932) 490. 

Marric^ge, males according to age at, (Example 
21.9) 121-2. 

rate in England and Wales, (Table 30.2) 397, 

(Example 30.3, Table 30.5, Figure 30.4) 
408-9. 

Martin, E. S., N.R., 359. 

Mass production, see Quality Control. 

Matching problems, Bihl. : Battin (1942) 446, 
D. W. Chapman (1935) 461, J. A. Green- 
wood (1938) 465, (1940) 466, GrevUle (1938, 
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1941) 466, Olds (1938a) 481, Vernon (1936) 
496, Wilks (1932c) 499. 

Mathematical ‘Tripos, distribution of women 
obtaining firsts in, (Example 18.5) 56-7. 

Matrix, arithmetic of, Aitken (1937a, 6, 1938) 443, 
Bingham ^1941) 447, Dwyer (1941a, h) 458, 
Hotelting (1943) 469. 

Maximum likelihood estimators, 12-49 ; con- 
sistence, 13-15 ; normality, 15-17 ; variance 
of, 17-18 ; efficiency of, 18-19 ; sufficiency, 
19-20 ; for several parameters, 34-49 ; 
variance and covariance of, 36-7 ; relation 
with minimum variance, 53, and with con- 
fidence intervals, 73-4. 

Bihh : Carlson (1932) 451, Fisher (1912, 
1921a, 19256, 1928c) 461, (1932, 1934a) 
462, Hotelling (1930) 469, Jeffreys (19386. 
1938c) 471, Koshal (1933, 1935, 1939) 474, 
Myers (1934) 479, E. S. Pearson (1937a) 
483, K, Pearson (1936) 486, Welcli (1939c) 
499. 

McKendrick, A. G., N,R.^ 136. 

Mean, arithmetic, estimation of, 2 ; (Example 

17.6) sufficient estimator for, 11 ; (Example 

17.7) 19-20 ; most general distribution for 
which it is estimator (Example 17.10) 22 ; 
significance of, 98-100. (Examples 27.1, 
27.2) 311-12. 

deviation, in testing normality (Geary’s 

ratio), 106 ; distribution of m.d., BihL : 
Fisher (1920) 461, Fr^het (1936a) 463. 
Tricomi (19366, 1937) 495. 

difference, 108. BibL : Cantelli (1913) 450, 

do Finetti and Paciello (19306) 455, de 
Finetti (1931) 455, U. S. Nair (1936) 479, 
Wold (1935) 501. 

values, Bibl. : Aumann (1934-5) 444, Bunak 

(1936) 449, A. T. Craig (19366) 453, Dodd 
(1934, 1937a, 6, c, 1938) 457, Doodsoii (1917) 
458, Dressel (1941) 458, Norris (1935, 1937) 
481, Wertheimer (1937) 499, Yasukawa 
(1925) 501, Zoch (1935, 1937) 503. 

Means, distribution of, BibL : Baker (1930d, 1931, 
1932, 1936, 1940) 444, Behrens (1929) 446, 
B. C. Bose (1938a) 448, Carlson (1932) 451, 
Cochran (1937a) 452, A. T. Craig (1932) 
453, Dodd (1926-7) 456, Dunlap (1931) 458, 
Hall (19276) 467, ^olzinger and Church 
(1929) 469, Irwin (1927, 1929, a, 1930) 470, 
Immer (1937) 470, Isserlis (1918a) 470, 
Jeffreys (1940) 471, Kolmogoroff (1929) 473, 
Pizzetti (1939) 487, Pollard (1934) 487, 
Rhodes (1927) 488, Bomanovsky (1929) 
489, Simon (1943) 491, Truksa (1940) 495. 
See also Central Limit Theorem, Mean 
Values. 

, test of difference, see Difference ; in multi- 
variate analysis, 338-41. 

A.S.— VOL. n. 


Mean-square contingency, see Contingency. 

successive difference, BibL : Hart (1942) 

467, von Neumann and others (1941a, 6) 
497, J. D. WilUams (1941) 500. 

Median, as estimator, 5 ; confidence intervals for, 
(Exercise 19.5) 84. BibL : Cisbani (1938) 

452, Doodson (1917) 458, Gini and Galvan! 
(1929) 465, Gini (1938) 465, Gini and 
Zappa (1938) 465, Gulotta (1938) 466, 
Haldane (19426) 467, Hojo (1931. 1933) 
469, Jackson (1921) 471, K. B. Nair (19406) 
479, K. Pearson (19316) 486, Pollard (1934) 
487, Savur (1937a) 490, W. R. Thompson 
(1936) 494, Ville (1936c) 496. 

Migration, see Random Migration. 

' Minimum variance, of maximum likelihood esti- 
mators, 18-19 ; in estimation, 50-5. 

in estimation, 55-8. 

Missing plot technique, 229-33. BibL : Allan 
and Wishart (1930) 443, Cornish (1040a, 6) 

453, K. R. Nair (1940a) 479, Yates (19336) 
501, Yates and Hale (19396) 502. 

Mode, BibL : Doodson (1917) 458, Haldane 
(19426) 467, K. Pearson (19026) 484, 
Yasukawa (1926) 501. 

Moment-function, B^6Z., U. S. Nair (1939) 479. 
See Characteristic Functions, Generating 
Functions. 

Moments, efficiency of, 43-4. 

of distributions (specification), BibL : Corn- 
ish and Fisher (1937) 453, Fisher (1037a) 
462, R. Henderson (1907) 468, O’Toole 
(1933) 481, Pearl (1937) 482, K. Pearson 
(1936) 486, Bomanovsky (19366) 489, von 
Mises (1937) 497. See Curve Fitting. 

, problem of, BibL : Bodowadt (1936) 447, 

Broggi (1934) 449, Chlodovsky (1938) 451, 
Hamburger (1920, 1921) 467, Haussdorf 
(1923) 468, Haviland (1935, 1936) 468, 
Marcinkiewicz (1939) 477, P61ya (1920, 
1938a) 487, Stekloff (1914) 492, Stieltjes 
(1918) 493, Widder (1934) 499. 

, sampling, BibL : Bernstein (1932) 446, 

C. C. Craig (1928) 453, (1940) 454, Dwyer 
(1937a, 1938, 1940) 458, Fisher (19296) 
461, Fisher and Wishart (1931) 462, Geary 
(1933) 464, Irwin and Kendall (1944) 470, 
Isserlis (19186, c, 1931) 470, St. Georgescu 
(1932) 493, Sukhatme (1938c, 1944) 494, 
Tschuprow (19186, 1921, 1923) 495, Wilks 
(1934, 1936) 499, Wishart (1929a, 6, 1930,?, 
1931a, 6, 19336) 500, Wishart and Bartlett 
(19326) 500, Ziaud-din (1938) 503. 
also A;-statistics. 

Monotonic functions, in distribution theory, BibL^ 
Bochner (1937) 447. 

Mood, A. M., N.R., 304. 

Moore, G., phases in time-series, 126 ; ^.B., 136. 
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Morant, G., N.B .9 394. 

Morgan, W. A., N.R.^ 137. 

Mortality, see Life. 

Most-efficient estimator, 6, 10, 18-10. 

MoH^t-selective confidence intervals, 75, 82. 

Moths, effect of weather on, (Example 22.10) 
171-2. 

Moving averages, 372-87, 399. Bihl. : Dodd 
(1930a, 1941a, &) 457, Frisch (1938) 464, 
Wold (19386) 501. 

fnth values, Gumbel (1934, 1935a, 1939) 

466. 

Multinomial distribution, Bibh^ .Kullback (1937) 
474, Lurquin (1937) 476. 

Multiple correlation, Bihh : Bacon (1938) 444, 
R. C. Bose (1934) 447, Fisher (19286) 461, 
Hall (1927a) 467, Kelley and McNemar 
(1929) 472, Kullback (1936c) 474, K. Pear- 
son and Lee (1908) 484, K. Pearson (1916d) 
485, K. Pearson and Young (1918) 485, 
Soper (1920a) 492, Starkey (1939) 492, 
Tappan (1927) 494, Wilks (19326) 499, 
Wishart (19316) 500, Wong (1937) 501. 

curvilinear regression, 167, 236. See Re- 
gression. 

happenings, Bihh^ Greenwood and Yule 

(1920) 466, K. Pearson (19126, 1913) 484. 

Poisson Distribution, P61ya Distribu- 
tion. 

Multivariate analysis, 328-62 ; Wishart's distri- 
bution, 330-4 ; Hotelling’s distribution, 
335-8 ; significance of set of means, 338- 
41 ; discriminatory analysis, 341-8 ; 
canonical correlations, 348-58. 

BihL : Bartlett (10396, 1941) 445, Bishop 
(1039) 447, Fisher (1936a, 6, 1938c, 19396, 
1940d) 462, Hotelling (1933, 1936a, 6) 469, 
P. L. Hsu (19396, 1941a, c,d) 469, Madow 
(1937, 1938) 476, Mahalanobis (1930, 1936a) 

476, Mahalanobis and others (19366) 476, 
Martin (1936) 477, Rider (1936) 488, Roy 
(1938, 1939a, 6, 1942a, 6) 489, Simonsen 
(1937) 491, Wald and Brookner (19416) 
498. 

distributions, estimation in, 33-7 ; normal, 

see Normal. Bihl. : Laser (1942) 475, 
Lukomski (1930) 476, Mahlmann (1935) 

477. See also Multiple Correlation. 

Myers, R. J., N.R., 45. 

Nair, K. R., confidence intervals for median, 81, 
N.R .9 83. 

Nayer, P. N., testing hypotheses, 299 ; N.R., 304. 

Negative binomial, Bi62., Fisher (19416) 462, 
Greenwood and Yule ( 1920) 466. See Tdlya, 
pistribution. 

Neyman, J., confidence intervals, 75-6 ; Behrens’ 
test, 93 ; randomised blocks, 214 ; theory 


of tests, 270, 299, 308, 311, 323 ; Exercises 
from : (Exercises 19.2, 19.3) 83, (Exercise 
21.12) 140, (Exercises 26.2, 26.3) 304, 
(Exercises 26.4, 26.5) 305, (Exercise 27.3) 
327. N.R., 45, 83, 94, 136, 172, 266, 303, 
304, 326. 

Nisbet, S. D., (Example 25.1) 258-9. 

Non-central confidence intervals, 66. 

t, Bihl., N. L. Johnson and Welch 

(1940a) 471. 

Non-normal data, in variance-analysis, 205-15. 

populations, Bihl. : Baker (1934) 444, 

Bartlett (1935a) 445, C. C. Craig (1941a) 
454, Geary (19366) 464, Laderman (1939) 
474, A. N. K. Nair (1942) 479, Pearson and 
Adyanthaya (1928, 1929) 482, E. S. Pearson 
(19316) 482, Rider (1931a) 487, Rietz (1932, 
1939) 488, Thorndike (1937) 494. 

Non-orthogonal data, Bibl. : K. R. Nair (1942) 
479, Wilks (1938e) 500, Yates (1934a) 501. 

Non-parametric tests, 322. Bibl., Schefr<6 (1943) 
490. 

Non-random samples, Bibl., “Student” (1909) 
493. 

Nonsense correlations, Bibl., Yule (1926) 503. 

Normal equations, solution of, Bibl., Hoel (1941) 

468. 

population, estimation of mean, 2, (Example 

17.6) 11, (Example 17.7) 19-20, (Example 
18.1) 51 ; estimation of variance, (Example 
17.6) 11, (Example 18.4) 54-5 ; centre of 
location of, (Example 17.22) 42 ; confidence 
intervals for mean, (Example 19.1) 63-4, 
(Example 19.3) 70 ; fiducial distribution, 
85; bivariate, (Example 17.17) 33-4, 
(Example 17.18) 37-8 ; regressions of, 

(Example 22.1) 144. 

Bibl. : Baker (1931) 444, Bergstrom 
(1918) 446, Cramer (1923, 1936) 454, ErdOs 
and Kao (1939) 459, Haldane (1942a, 6) 
467, C. T. Hsu (1940, 1941) 469, Isserlis 
(19186) 470, Kac (1939) 472, Khintchino 
(1936) 473, KuUbaok (l635a) 474, Leder- 
mann (1939) 475, Lehmann (1939) 475, 
Lengyel (1939) 475, K. Pearson (1924c) 485, 
P61ya (1923) 487, Raikov (1938) 487, 
Rhodes (1928) 488, Tricomi (1935, 1936a,. 
19366) 495, Yule (19386) 503. 

Normalisation of frequency functions, Bibl. : 
Cornish and Fisher (1937) 453, Haldane 
(1938) 467, Mahalanobis and others (19366) 
476, Paulson (1942) 482. 

Normality, tests of, 105-6. Bibl. : Fisher (19306) 
461, Geary (1935a, 6, 1936a) 464, Geary 
and Pearson (1938) 464, E. S. Pearson 
(1930, 1935c) 482, Yasukawa (1934) 501. 

Nuisance parameters, 134. Bibl., Hotelling (1940) 

469. 
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Olds, E. G., N.R., 266. 

Omega, for testing goodness of lit, 107-9. BibL, 
Smirnoff (1936) 491. 

One-sided confidence intervals, 76. 

Oppenheim, S., N.R,, 437. 

Order, in random series, 122-4, and see Random 
Order. 

Orthogonal datpi, in variance-analysis, 219, 254. 

polynpmials, 146-64, 169-67. BibL : Aitken 

(193i 1933a, 6, c) 442, Allan (1930) 443, 
Dieulefait (19346) 466, Fisher (19216, 19246) 
461, Greenloaf (1932) 466, Jackson (1934, 
1937, 1938) 471, Jordan (1932) 472, Lidstone 
(1933) 476, Romanovsky (1927) 489, San- 
sone (1933) 490, Shohat (1936) 491, C. D. 
Smith (1939) 491, Tartlor (1936) 494, 
Tchebycheff (1907) 494, Webster (1938) 
498, Wishart (1933a) 500, Wong (1936) 601. 

transformations, BibL, Landahl (1938) 474, 

Ledennann (1938) 476. 

Oscillations, in time-series, 369, 370, 380, 397-8. 
See Periodicity. 


p-statistics, BibL, Roy (19396, 1942a) 489. See 
Multivariate Analysis. 

P^y, test, see Combination of Tests. 

Paired comparisons, BibL, Kendall and Babington 
Smith (1940) 472. 

Parameters, estimation of, see Estimation. 

of location and scale, 40-2. 

Partial correlations, BibL : Isserlis (1914, 1916) 
470, Stoufier (1934) 493, Subramanian 
(1935) 493. 

Pasteurised milk, in feeding, (Example 21.14) 133. 

Path coefficients, BibL, Engelhart (1936) 459, 
Wright (1934) 601. 

Paulson, E. A., z-distribution, 118 and N.R., 136. 

Peaks, in time-series, 124. 

Pearson distributions, moments in fitting, 43-4 ; 
sufficient estimators in (Exorcise 17.18) 49. 
BibL : Ambarzumian (1937) 443, Baker 
(1940) 444, Beale (1937) 446, C. C. Craig 
(19366) 464, Dieulefait (19366) 466, Fisher 
(1921a) 461, Hildebrandt (1931) 468, Irwin 
(1930) 470, K. Pearson (1894, 1896, 19016) 
483, (1916a) 484, (1924a) 485, Romanovsky 
(1924) 489, Wishart (1926) 600. See also 
Type I, etc. 

Pearson, E. S., confidence intervals for binomial, 
81 ; t in non-normal case, 103 ; test of 
normality, 106; z in non-normal case, 
205 ; (Exercise 23.4) 216-17 ; analysis of 
covariance, 238 ; (Exercises 26.2, 26.3, 26.4, 
26.6) 304r-6 ; N.R., 45, 83, 136, 137, 246, 
266, 303, 304, 369. 

— K., (Example 21.14) 133; N.R., 46, 137, 
172, 173, 394. 


Peas, yields of, (Example 23.6) 200-2. 

Periodicity and periodogram analysis, 423-5, 
432-3, 433-6. BibL : Alter (1924, 1925, 
1926a, 6, 1933, 1937) 443, Beveridge (1921, 

1922) 446, Bradley and Crum (1939) 449, 
Brownlee (19246) 449, Bruns (1921) 449, 
Brunt (1925, 1928) 449, Buys-BaUot (1847) 
460, J. I. Craig (1916) 464, Crum (1923, 
1926) 464, Dodd (1930) 466, (1939a, 6, 
1941a, 6) 457, Frisch (1928, 1931, 1933) 
463, Greenstoin (1936) 465, Hersch (1934) 
468, Kalecki (1936) 472, Koopmans (1940) 
474, Kuznots (1929, 1933) 474, Larmor and 
Yamaga (1917) 476, Mitchell (1913) 478, 
Mitchell and Bums (1936) 478, Moore (1914, 

1923) 478, Moulton (1938) 478, Oppenheim 
(1909) 481, Pietra (1926) 486, Poliak (1927) 
487, Poliak and Kaiser (1936) 487, Powell 
(1930) 487, Savur (1941) 490, Schuster 
(1898, 1899, 1906) 490, Soper (19296) 492, 
Starkey (1939) 492, Stumpff (1926, 1937) 
493, Tinbergen (1937, 1938) 496, Tintner 
(1936) 495, Trachtenberg (1921) 496, Vinci 
(1934) 496, Walker (1914, 1926, 1927, 1931) 
498, Wallis and Moore (1941) 498, Yule 
(1927a) 603. See also Harmonic Analysis, 
Time-series. 

Phases, in time-series, 124, 125-6. 

Pilot sampling, 262, N,R», 266. 

Pitman, E. J. G., tests of significemee, 128-32, 
136 ; z-test, 211 ; tests of hypotheses, 
323-6 ; Exorcises from, (Exercises 17.9, 
17.10, 17.11) 47, (Exercise 21.3) 138, 

(Exorcise 21.16) 140, (Exercise 27.2) 326. 
N.R., 45, 137, 216. 

Plant breeding, BibL, Y. Tang (1938) 494. 

Plot arrangements, BibL, Tedin (1931) 494. See 
Design. 

Poisson distribution, (Example 17.9) 21-2; con- 
fidence intervals for, (Example 19.4) 70-1, 
81 ; conditional test for, (Example 21.12) 
127 ; in variance-analysis, 206-7. 

BibL : Ackermann (1939) 442, R. A. 
Chapman (1938) 461, Cochran (1936a, 
19406) 462, Copeland and Regan (1936) 453, 
Doetsch (1934) 467, Fisher and others 
(1922c) 461, Garwood (1936) 464, Irwin 
(1935, 1937a) 470, L6vy (1937a) 476, Luders 
(1934) 476, Molina (1942) 478, Poisson (1837) 
487, Przyborowski and Wil^nski (1940) 487, 
Raikov (1936) 487, Ricker (1937) 488, 
Satterthwaite (1943) 490, Student (1907, 
1919) 493, S\:^atme (19376, 1938a) 494, 
von Bortkiewicz (1898, 1910) 496, Weida 
(1935) 498, Whitaker (1914) 499. 

Poisson’s theorem in probability, BibL, Bochner 
(1936) 447, Bonferroni (1933) 447. See 
Central Limit Theorem. 
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P61ya distribution, del Chiaro (1936) 456, 

S. Quldberg (1936) 466. See Negative 
Binomial. 

Polychoric correlations, Bibh, Pearson and Pearson 
(19226) 485, Ritchie-Soott (1918) 489. 

Polynomials, expansions in, BihL, Cacciopolli 
(1932) 450, Davis (1933) 455. See Ortho- 
gonal Polynomials, Curve Fitting. 

Population of England and Wales, (Example 
22.7) 161-3, (Examples 22.8. 22.9) 164-7, 
(Table 29.2, Figure 29.2) 365. 

analysis, Bibl. : Lotka (1938, 1939) 476, 

Pearl and Reed (1923) 482, Volterra (1936) 
496. 

Potato yields, (Example 21.11) 126. 

Power of a test, 272, 307-8. Bibl, : G. W. Brown 
(1939) 449, Dantzig (1940) 456, Eisenhart 
(1938) 459, MacStewart (1941) 476, Simaika 
(1941) 491, P. L. Hsu (19416) 469, P. C. 
Tang (1938) 494. See aleo Statistical 
Hypotheses. 

Powers of normal variates, Bt61., Haldane (1942a) 
467. 

Prediction, eee Forecasting. 

Pretorius, S. J., N.i?., 173. 

Principal components, Bihl. : Girshik (1936) 465, 
HotolUng (193^, 1936a) 469. Landahl (1938) 
474, Lodermann (1938) 475, Thurstone 
(1935) 495. 

Probability, Bibh : Bartlett (19336) 445, Beck 
(1936) 446, Belardinelli (1934) 446, Borel 
(1939) 447, Broderick (1937) 449, Cantelli 
(1932, 19336) 450, Castelnuovo (1932) 451, 
Cram4r (1937, 1938, 1939) 454, de Finetti 
(1933a, 6, 1939a) 456, Doeblin (1938) 457, 
Doob (19346, 1941) 457, Eggenberger (1924) 
459, Erd51yi (1937) 459, Khintchine (19376) 
473, Kolmogoroif (1931, 1933a) 473, L4vy 
(1931a, 1931c, 1936a, 1937a, 1938a) 475, 
Lomnioki (1923) 476, Marchaiid (1937) 477, 
McKinsey (1939) 477, Moisseiev (1937) 478, 
Nagel (1936) 479, Reichenbach (1937) 488, 
Rice (1938) 488, Romanovsky (1931a) 489, 
Tornier (1929, 1930, 1936, 1937) 495, von 
Mises (1919a, 6, 1928, 1931, 1936a, 6, 1939c, 
1941) 497, Urban (1918) 496, Uspensky 
(1937) 496. 

Probits, Bibl, Bliss (1935, 1937) 447. 

Product, distribution of, Bibl.^ C. C. Craig (1936a) 
454. 

Product-moment correlation, see Correlation. 

Proficiency test of recruits, (Example 24.7) 240-2. 

Proportionate frequencies, in variate-anal 3 rsis, 228. 

Proportions, tests of, BihL, Swaroop (1938) 494. 

Quadratic forms, eee Independence of Quadratic 
Forms. 

Quality control, BibL : Becker and others (1930) 


446, Jennett and Welch (1939) 471, E. S. 
Pearson (1933a, 1934) 482, Shewhart (1931) 
491, Simon (1941) 491, Welch (19366) 498, 
Wilks (1941) 500, Wolfowitz (1943) 501. 

Quartiles, Bibl., Hojo (1931, 1933) 469. 

Quasi-Latin squares, BibL^ Yates (1937a) 502. 

Quasi-sufficiency, Bibl.^ Bartlett (1940) 445. See 
Conditional Statistics. 

Racial likeness, N.P., 358. Bibl., Morant (1939) 
478, K. Pearson (19266) 485. See Multi- 
variate Analysis. 

Rainfall in London, (Table 29.4, Figure 29.4) 367. 

Random component in timo-Series, 369 ; effect of 
trend-elimination on, 378-87 ; tests for, 
399. 

migration, Bibl., Brownlee (1911) 449. 

occurrences, Bibl., Morant (1921) 478. 

order, tests of, 122-7. Bibl. : (runs, etc.) 

Andr5 (1884) 444, Besson (1920) 446, Borel 
(1933) 447, Denk (1936) 456, Fisher (19266) 
461, Gumbel (1943a) 466, Jones (1937c) 
472, Kaiicky (1936) 472, Moot! (1940) 478, 
von Bortkiowicz (1915a, 1917) 496, von 
Mises (1921) 497, Wolfowitz (1943) 501. 

paths, Bibl., McCroa (1936) 477, P61ya 

(19386) 487. 

samples, tables of, Bibl., Mahalanobis and 

others (1934) 476. 

sampling numbers, Bibl. : Kendall and 

Babington Smith (1939a) 472, K. R. Nair 
(1938a) 479, Yule (1938a) 503. 

sequence, Bihl. : Copeland (1928, 1929, 

1932, 1936, 1937) 453, Ddrgo (1934, 1936) 
458, Greville (1939) 466, Regan (1936, 
1938) 487, Rice (1939) 488, Swed and 
Eisenhart (1943) 494, Ville (1936a, 6) 496, 
von Mises (1931, 1933) 497, Wald (19366, 

1937) 497, Young (1941) 502. 

variables, i?i6/. ; Cram5r (1936a) 454, Cramer 

and others (1938) 454, de Finetti (1929) 
455, Eyraud (19386) 459, L5vy (1934, 
1935a, 6, 1936c, 1939a, 6)i476. See Proba- 
bility. 

Randomisation, and z-test, 209-13, 255-6 ; in 
design, 263-6. Bihl., E. S. Pearson (19376, 

1938) 483 ; ahd see Design. 

Randomised blocks, 213-14. Bihl. : Cornish 

(1940a) 453, McCarthy (1939) 477, Welch 
(1937) 498. See Blocks. 

Randomness, Bihl. : Borel (1937) 447, Dodd 
(1942) 457, Kendall (1941) 472, Kermack 
and McKendrick (1936, 1937) 473, Wiener 
(1938) 499. 

Range, test of, (Exercise 27.3) 327. Bibl. : Geary 
(1943) 464, Hartley (1942) 467, McKay and 
Pearson (1933) 477, Newman (1939) 480, 
Olds (1935) 481, E. S. Pearson (1926, 1932) 
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482, Pearson and Haines (1935a) 482, 
Pearson and Hartley (1942, 1943) 483, 
Romanovsky (19336) 489, W. R. Thompson 
(1938) 494, Tippett (1925) 495. 

Rank correlation, 123, 441. BibL : Daniels (1944) 
455, Dantzig (1939) 455, Dubois (1939) 458, 
Hotelling and Pabst (1936c) 469, Kendall 
(19386, 1942a) 472, Kendall and others 
(1939, 19396) 472, Olds (19386) 481, K. 
Pearson (1914, 1921) 484, Pearson and 
Pearson (1931c, 1932) 486, “Student” 
(1921) 493, Wallis (1939) 498, Watkins 
(1933) 498, Woodbury (1940) 601. 

Ratio, distribution of, BibL : C. C. Craig (19296) 
463, Curtiss (1941) 454, Fieller (19326) 460, 
Geary (1930) 464, Gordon (1941) 466, 
Hirschfeld (1937) 468, Kullback (1936a) 
474, Nicholson (1941) 481, van IJven (1932, 
1939) 496. 

Rectangular distribution, estimation of extremes, 
(Bxamplo 17.16) 28 ; intrinsic accuracy, 
(Example 17.11) 47 ; estimation by sample- 
centre, (Exercise 17.16) 48; confidence 
intervals for range, (Exercise 19.1) 83. 
BibL : O. L. Davies (1932) 466, Dunlap 
(1931) 468, Hall (19276) 467, Olds (1936) 
481, Rietz (1931a) 488. 

Region of acceptance, 63, 76, 270. 

Regression, Gauss' theorem on residuals, 60~1 ; 
generally, 141-74 ; analytical theory, 
141-6 ; fitting of curvilinear regressions, 
145-63 ; standard errors and tests of sig- 
nificance, 163-8 ; equal steps of variate, 
169-67 ; multiple curvilinear, 167 ; addi- 
tion of new variates, 167-72 ; in analysis 
of variance, 233-6 ; relation with Hotelling's 
T, 336- 7 ; in discriminatory analysis, 344-6. 

BihL : R. G. D. Allen (1939) 443, H. V. 
Allen (1938) 443, Andersson (1932) 443, 
(1934) 444, Bartlett (1933a, 1938c) 446, F. 
Bernstein (1937) 446, Blakoman (1906) 447, 
S. S. Bose (1934a, 6, 19386) 448, Camp 
(19266) 450, Cochran (1938a) 452, Dodd 
(19376, c) 467, Dwyer (19376, 1941c) 458, 
Eisenhart (1939) 459, Ezekiel (19306) 460, 
Fisher (19226) 461, Galton (1886) 464, 
Jones (19376) 472, Koopmans (1937) 474, 
Mendershausen (1937a) 477, T. V. Moore 
(1937) 478, Neyman (1926) 480, K. Pearson 
(1896) 483, (1921, 1926a) 485, Quensel 
(1936) 487, Richards (1931) 488, Roman- 
ovsky (1926, 19316) 489, Slutzky (1914) 
491, K. Smith (1918) 492, Waugh (1942) 

498, Welch (1936) 498, Wicksell (19346) 

499, Yates (1939d) 602, Yule (1936) 603. 

coefficients, standard error of, 163-6 ; exact 

tests of, 156-8. 

Regular unbiassed critical regions, 318-19. 


Rejection of observations, BibL : Irwin (19256) 
470, Pearson and Chandra Sekhar (1930) 
483, Rider (1933) 488, W. R. Thompson 
(1935) 494. 

Relaxed oscillations, BibL, Le Corbeiller (1933) 
476, van der Pol (1930) 496. 

.Reliability coefficients, BibL, Stouffer (19366) 493. 

Replication, 255. BibL : Bartlett (1938a) 445, 
Cochran (19376, 19386, 1939a) 462, Yates 
(1933a, 6) 500, (1936d() 50i: See Design. 

Representative method of sampling, BibL : A. T. 
Craig (1939) 463, Jenson (1926) 471, Ney- 
man (19336, 1934) 480, Sukhatme (1935) 
493. . 

Residual, in variance-analysis, 178, 185-7. 

Ricker, W. E., confidence intervals for Poisson 
distribution, 81. 

Riemann zeta-function, BibL, Jesseh and Wintner 
(1935) 471. 

Risk, theory of, BibL, Cram4r (1923) 454, Esscher 
(1932) 459. 

Robinson, G., N,iL, 394, 437. 

Roots of equations, distribution of, BibL, Girshik 
(1939, 1942) 465. 

Routine analysis, BibL : Neyman (19396, 19416) 
480, Przyborowski and Wil^nski (19356) 
487, “Student” (1927) 493. 

Roy, 8. N., distribution of canonical correlations, 
357 and N.i?., 359. 

Runs, in time-series, see Random Order. 

Sampling distributions, moments of, see l;-stati8tic8. 
Moments. 

inquiries, see Design. 

, miscellaneous, BibL : Bartky (1943) 445, 

Bartlett (19376) 445, Baton (19336) 446, 
Bowley (1925) 448, Burks (1933) 450, Clap- 
ham (1931, 1936) 452, Cochran (19366, 
19396, 19426) 452, A. T. Craig (1933a, 6) 

453, C. C. Craig (1931o) 463, Crum (1933) 

454, David (19386) 455, Hey (1938) 468, 
Hilton (1924, 1928) 468, Kiser (1934) 473, 
McKay (1934) 477, Neyman (1933a, 1934, 
1938a) 480, Olds (1939, 1940) 481, Panse 
(1939) 482, E. 8. Pearson (1933a, 1934) 
482, Pepper (1929) 486, Rhodes (1925) 488, 
Rider (19316) 488, Rietz (1937) 488, Shew- 
hart and Winters (1928) 491, “ Sophister 
(1928) 492. 

surveys, BibL, A. N. Bose (1941) 447, C. 

Bose (1943) 447; and see Sampling, miscel- 
lanoouH. 

Sasuly, M., N.B., 394. 

Savur, S. R., N.R., 83. 

Scale, estimation of parameters of, 40-2 ; elimina- 
tion of parameters of, 79-80 ; Pitman’s 
tests of, 323-6. BibL, Pitman (1939a, 6) 
486. 
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Scale, reading, BM., Yule (1927&) 503. 

Seales of measurement, Bibl, Coohran (1943) 452. 

Soatteranoe, N,R.f 358. 

.^^ Soedastio curve, 142. 

k4Seheff4, H., non-parametric tests, 322 ; JV.JR., 
^ 304, 326. 

»^oolchildren, tests of, (Example 25.1) 258-9, 
% ‘ (Example 28.4) 351-2. 

Sohultaif H., N,R., 394. 

Schuster, Sir Arthur, signiiioanco of periodogram, 
434 ; JV.J?., 437. 

Seasonal effect, in time-series, 369. Bibl. : Bow- 
ley and Smith (1924) 448, Carmichael (1931) 
451, Carver (1932) 451, Crum (1925) 454, 
Detroit Edison Co. (1930) 456, Donner 
(1928) 457, Falkner (1924) 460, Gressens 
(1925) 466, Mendershausen (19375) 478, 
Robb (1929, 1930) 489, Wald (1936a) 497, 
Wisniewski (1934) 501, Zrzavy (1933) 503. 

Second Limit Theorem, BibL, Fr5ohet and Shohat 
(1931) 463. 

moment, see Variance. 

Seed in optic^ glass, (Example 23.6) 202-5. 

Seeds of wheat, germination of, (Example 23.7) 
207-9. 

Selective confidence intervals, 75-6. 

Semi-normal distribution, Bibl.^ Steffensen (1937) 
492. 

Seminvariants, see Cumulants, As-statistics. 

Sensitivity, of tests of significance, 256. 

Serial correlation, 402-4. See Correlogram. BibL : 
R. L. Anderson (1942) 443, Bartlett (1935c) 
445, Dixon (1944) 456, Kendall (1944a, 6) 
473, Koopmans (1942) 474, Marples (1932) 
'477, Schumann and Hofineyer (1942) 490, 
Yule (1921) 502, (1926, 1927a) 503. 

Sheep population of England and Wales, (Table 
29.3, Figure 29.3) 366, (Example 29.5) 
385-6, (Example 30.5) 411, (Example 30.8) 
416-18. 

Sheppard’s corrections, see Grouping Corrections. 

Shortest confidence intervals, 71-5, 75-6. 

Significance tests, 96-140, 269-327. See Statistical 
Hypotheses. BM., JeEEreys (1938a) 471, 
Reiser (1943) 486. 

Silverstone, H., minimum variance, 61 ; (Exer- 
cises 18.1, 18.2) 61. 

Simaika, J., N.R., 304, 359. 

Similar regions, 283. Bibl.^ Feller (1938) 460. 

Simon, L. E., JV.R., 61. 

Simple hypotheses, 269, 272-82, 317-26. 

Simultaneous estimation, of several parameters, 
34-44. 

fiducial distributions, Bartlett (1939a) 

445. 

Sinusoidal limit, JV'.R., 394. Bibl, : Marsueguerra 
(1936) 477, Romanovsky (1931c, 1932a, 
1933a) 489, Slutzky (19376) 491. 


Skewness, Bibl,, Frisch (1934a) 464, Garnel* (1932) 
464. 

Skulls (Egyptian), (Example 28.3) ‘345-8. 

Slutzky, E., N.R., 394, 399. 

Slutzky-Yule effect, 378-87, 399. Bibl,, Slutzky 
(19376) 491, Yule (1921) 502. 

Small numbers, law of, see Poisson Distribution. 

Smirnoff, N., o>*-test, 109. 

Smith, H. Fairfield, N,R., 359. 

, K., minimum-;^*, 55 and N,R,, 61. 

Smoothing, see Moving Averages, Trend. 

I^oil, loss of weight in, (Example 22.3) 149-52,. 
(Example 22.6) 158. 

Solomon, L., footnote, 51. 

Spearman, C., (Exercise 25.3) 267. 

Spearman’s factor theory, see Factor Analysis. 

p, test of, 132. 

Speed tests in children, (Example 28.4) 351-2. 

Spelling ability in children (Example 25.1) 258-9. 

Spencer’s formula in curve fitting, (Examples 29.2,. 
29.3) 376-7, 378-80, (Exercise 29.3) 394-5, 
(Example 30.2) 405. 

Spurious correlation, Bibl, : K. Pearson (18976) 
483, Spearman (1907, 1910) 492, Wicksell 
(1921) 499. 

Square of a variate, Bibl,, Haldane (1941) 467. 

Squariance, footnote 178. 

Stabilising of variance, 207. 

Stability of series, see Lexis Theory. 

Stable laws of probability, Bibl, : Bochner (1937) 
447, Feldheim (1937a) 460, Khintchine and 
L5vy (1936) 473, Khintchine (1938) 473. 

Standard deviation, estimation of, (Example 17.5) 
6-7, (Example 17.6) 11, 52. See Variance. 

errors, in testing significance, 97-8 ; of 

regression coefficients, 163-6. Bibl, : Derk- 
son (1939) 456, Edgeworth (1908, 1909) 
459, Eels (1929) 459, Hendricks (1934) 468, 
Isserlis (1915, 1916) 470, MiUer (1934) 478, 
K. Pearson (1903, 1913, 1920) 484, (l924d> 
485, K. Pearson cuid Lee (1908) 484, K» 
Pearson and Filon (1898) 483. 

Latin squares, 259. 

Stationary time-series, 396. Bibl. : Khintchine 
(1932, 1933, 1934) 473, Slutzky (1934) 491, 
Wold (1938a, 1939) 501. See Time-series, 
Correlogram. 

Statistical hypotheses, definition, 269 ; errors of 
first and second kind, 270-2 ; power 
function, 272 ; simple hypotheses, 272-5 ; 
best critical regions, 277-80 ; relation with 
sufficient estimators, 281-2 ; composite 
hypotheses, 282-3 ; similar regions, 283-7 ; 
of several degrees of freedom, 287 ; linear 
hypotheses, 292-5. ; likelihood criteria, 
295 ; k samples, 295-302 ; bias, 307-26 ; 
regions of Type A, 309-14, of Type Aj, 
314t-16, of Type B, 310-17, of Type^C, 
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817-22 ; limiting properties, 322 ; Pitman’s 
teats, 323-6. 

BibV: G. W. Brown (1940) 449, Chandra 
Sekhar and Francis (1941) 461, Daly (1940) 
464, Dantzig (1940) 466, Qumbel (1942) 

466, R. W. Jackson (1936) 471, Kolod- 
zieczyk (1933, 1936) 474, Neyman (19366, 
19386) 480, (1942) 481, Neyman and Pear- 
son (1928, 1931a, 1933a, c, 1936a, 1938) 
480, E. S. Pearson (1941, 1942a) 483, 
Pitman (19396) 486, Rietz (1938) 488, 
Soheff6 (1942a, 1943) 490, Wald (1939a) 
497, (1941a) 498, Wilks (1936c. 1938a) 499, 
Wolfowitz (1942) 601. 

Statistical Review of England and Wales, data from, 
(Example 21.8) 120, (Example 21.9) 121. 

Stevens, W. L., test of significance in periodogram, 
434; NJt., 216. 

Stieltjes integrals, BibL, Shohat (1930) 491. 

Stochastic convergence, 440. See Convergence in 
Probability. 

dependence, see Independence. 

processes, BibL, Doob (1934a, 1937, 1938) 

467, Feller (1936a) 460. See Probability. 

Stock forecasting, BibL, Cowles (1933) 463, Cowles 

and Jones (1937) 463. 

Stock, J. S., N.B,, 266. 

Strat^ed sampling, 249-62. BibL : P. H. Ander- 
son (1942) 443, Baker (1930c) 444, G. M. 
Brown (1933) 449, Frankel and Stock (1939) 
463, McKay (1934) 477, Mood (1943) 478. 
See also Sampling, miscellaneous. Repre- 
sentative Method. 

“ Student ” (W. S. Gosset), see Gosset. 

Studentisation, 79-81, 134. BibL, Hartley (1938, 
1944) 467, Newman (1939) 480. 

** Student’s ” distribution, confidence intervals 
based on, 79-80 ; fiducial inference based 
on, 88 ; properties of, 10(1-2 ; in testing 
mean, 98-100 ; in non-normal case, 102-4 ; 
other uses, 104 ; in testing two means, 
109-10, 113-14 ; in testing Spearman’s p, 
124 ; in Pitman’s tests, 131, 132 ; in testing 
regressions, 166, 168, 172 ; in analysis of 
covariance, 244 ; (Example 26.9) 291. 

BibL : Bartlett (1936a) 446, C. C. Craig 
(1941a) 454, Daniels (1938a) 464, Fisher 
(1926a) 461, Geary (19366) 464, Hendricks 
(1936) 468, P. L. Hsu (1938a) 469, N. L. 
Johnson and Welch (1940a) 471, Kerrich 
(1937) 473, Kolockieczyk (1933) 474, Lader- 
mann (1939) 474, McKay emd others (1932) 
477, Merrington (1942) 478, A. N. K. Nair 
(1942) 479, Perlo (1933) 486, Rider (1929) 
488, Rietz (1939) 488, Steffensen( 1936) 492, 
** Student ” (1908a, 1931a) 493, Treloar and 
Wilder (1934) 496. 

hypothesis, 286-7. BibL, Neyman and 


619 

Tokareka (19366) 480, Przyborowski and 
WU^nski (1936a) 487. 

Stumpff, K., N.R., 437. 

Sufficient estimators, 7—12 ; given by maximum 
likelihood, 19 ; general form possessing, 
24—6 ; distribution of, 26 ; when range 
depends on parameter, 27-8 ; for several 
parameters, 39—40 ; giving minimum- 
variance estimators, 62 ; relation with 
confidence intervals, 74-6, 79 ; relation 
with U.M.P. tests, 281-2, with U.M.P.U. 
tests, 310. 

BibL : Bartlett (19366, 1937c, 1940) 446, 
Darmois (1936) 466, Koopman (1936) 474, 
Neyman (1936a) 480, Neyman and Pearson 
(1936a) 480, Pitman (1936) 486, Welch 
(1939a) 498. 

Sukhatme, P. V., tables for Behrens’ test, 92, 111 ; 
(Exercise 26.8) 305-6 ; sampling moments, 
440. N.R„ 94, 266, 304. 

Sum, distribution of, see Means. 

Summation convention, 329. 

Sunspots, BibL, Schuster (1906) 490, Yule (1927a) 
603. 

Symmetric functions, BibL, O’Toole (1931, 1932) 
481. See Moments, Xs-statistics. 

T-distribution, see Hotelling’s T. 

Tabular differences, BibL, Ladermann and Lowan 
(1939) 474. 

Tanbum, E., N,R., 137. 

Tang, P. C., linear hypotheses, 301 ; N.R„ 303. 

Tchebycheff, P. L., (Exercise 22.4) 173 ; N.H., 172. 

Tchebycheff-Hermite polynomials, BibL : Doetsoh 
(1934) 467, Erd61yi (1938) 469, Feldheim 
(19376) 460. See Gram-Charlier Series, 
Orthogonal Polynomials. 

Tchebycheff’s inequality, BibL : Berge (1938) 
446, Bernstein (1937) 446, Camp (1922) 460, 
C. C. Craig (1933) 464, K. Pearson (1919) 
486, C. D. Smith (1930) 491. 

Tea-drinking, BibL, Medialanobis (1943) 476. 

Telephone service, BibL, Newland and Neal (1939) 
479, Palm (1937) 482. 

Terminals of fmquency-distribution, confidence 
intervals for, 83. 

Test construction, BibL, Cureton and Dunlap 
(1938) 454.. 

Tests of significance, see Significance, Statistical 
Hypotheses. 

Tetrachoric fimctions, BibL : J. Henderson (1922) 
468, K. Pearson (1912a, 1913a, 6) 484, K. 
Pearson and Heron (1913c) 484, Newbold 
(1926) 479, Pearson and Pearson (19226) 
485. 

Tetrad difference, (Exercise 28.10) 362. BibL, 
Hotelling (19366) 469, Wilks (1932d) 499. 
See Factor Analysis. 
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Third momont, distribution of, Pepper 

(1932) 486. 

Thompson, C., on A-tests, 299; NM,, 303. 

Thompson, W. B., (Exeroise 19.6) 84; JV./?., 83. 

Thomson, G., (Example 26.1) ^68-9. 

Ties in ranking, 127, 441. 

Time-serieB, 363-439 ; examples of, 363-9 ; trend, 
371-8 ; effect of trend elimination, 378-87 ; 
variate difference method, 387-94 ; oscilla- 
tions, 397-9 ; tests for randomness, 399 ; 
types of oscillatory series, 396-402 ; serial 
correlations, 402-4 ; correlogram, 404-13 ; 
autoregressive schemes, 414-21 ; auto- 
correlation function, 421-3 ; periodogram 
analysis, 423-33 ; significance of a periodo- 
gram, 433-6 ; lag correlation, 436-7. 

Bibl : Bartels (1936) 446, Darmois (1929) 
466, Davis (1941) 466, Jones (19376, c) 472, 
KendaU (1944a, 6) 473, Koopmans (1937, 
1940, 1941) 474, Macaulay (1931) 476, 
Boos (1934, 1936) 489, von SzeUski (1929) 
497, Wallis and Moore (1941) 498, Wold 
(1938a) 601, Zaycoff (1936, 1937) 603. 

See dlao Correlogram, Harmonic Analysis, 
Periodicity. 

Tintner, G., variate-difference method, 393. N.R., 
394. 

Tokarska, B., N.R,, 303. 

Tolerance limits, aee Quality Control. 

Trade cycles, see Periodicity. 

Traffic signals, Bibl., Garwood (1940) 464. 

Transformation of distributions, Bibl. : Baker 
(1930a, 1934) 444, Beall (1942) 446, Bliss 
(1938) 447, Ciutiss (1943) 464, Frankel and 
Hotelling (1938) 463, Landahl (1938) 474, 
Bietz (19316) 488, Tricomi (1938) 496, 
Yasukawa (1926) 601, Zoch (1934) 603. 

Transvariation, Bibl., Castellano (1934, 1937) 461. 

Travers, B. M. W., N.R., 369. 

Trend, 369^70, 371-87. BM. : Lorenz (1931, 
1936) 476, Macaulay (1931) 476, Bhodea 
(1921) 488, Sasuly (1934) 490, ^humann 
(1938) 490, Sipos (1930) 491, Working and 
Hotelling (1929) 601. 

Trough, in time-series^ 124. 

Truncated normal distribution, Bibl., Keyfitz 
(1938) 473, Stevens (1937a) 493. 

Turner, H. H., N.R„ 437. 

Turning-point, in time-series, 124. 

Two samples, Bibl. : Behrens (1929) 446, Dixon 
(1940) 466, P. L. Hsu (1938a) 469, Lengyel 
(1939) 476, Mathisen (1943) 477, E. S. 
Pearson (1929) 482, Pearson and Neyman 
(1930) 482, K. Pearson (1911a) 484, (1931a) 
486, Peek (1937) 486, Bhodes (1924, 1926) 
488, , Bomanovi^y (1928) 489, Starkey 
(1938) 492, Sukhatme (1936, 19366) 493, 
Swaroop (1938 )l .494, W. B. Thompson 


(1933) 494, Wald and Wolfowitz (1940c) 
498, Welch (1938a) 498, Yates (1939/) 
601. 

Type A, B, C, in statistical tests, 309-27. 

Type I distribution, (Exercise 17.17) 49. 

II distribution, Bibl., Carlson . (1932) 461. 

Ill distribution, estimation of parameters 

in, (Example 17.8) 20-1, (Example 17.13) 
26, (Example 17.19) 39, (Example 18.3) 
63-4 ; sufficiency, (Example 17.21) 40 ; 
centre of location of, (Example 17.23) 42 ; 
confidence intervals for parameter (Example 
19.6) 74-6 ; fiducial distribution of para- 
meter, 87. Bibl. : C. C. Craig (1929a) 463, 
Kullback (1936a) 474, Olshen (1938) 481, 
Salvosa (1930) 490, Wioksell (1933) 499. 

IV distribution, centre of location of, (Exer- 
cise 17.16) 48 ; intrinsic accuracy of, 
(Exercise 17.19) 49. 

Unbiassed estimators, 3~-4 ; confidence intervals, 
76 ; tests, 309-27. 

Unequal subclasses, in variance-analysis, 220-4. 
Bibl. : Brandt (1933) 449, Wald (19406) 
497, (1941d) 498, Wilks (1938e) 600, Yates 
(1934a) 601. 

Uniformly most powerful tests, 276 ; imbiassed 
tests, 309, N.R,, 369. 

U-shaped distribution, Bibl., Holzinger and Church 
(1929) 469. 

Vcuriability, measures of, Bibl. : Castellano (1936) 
461, .do Vergottini (1936) 466, Galvani 
(1931) 464, Gini (1912, 1930) 466, Mamh 
(1926) 477, Pietra (1932a) 486, Vinci (1920) 
496. 

Variance, analysis of, see Analysis of Variance. 

, distribution and tests of, Bibl. : Baker 

(1931, 1932, 1935, 194^) 444, Church 
(1926, 1926) 452, A. T. Craig (1932, 1938) 
463, Dimlap (1931) 468, Fdrtig and Proehl 
(1937) 460, Greenwood and Greville (1939) 
466, Rondo (1930) 474, Le Boux (1931) 
476, K. Pearson (193 Id) 486, Quensel 
(1938) 487, Bhodes (1927) 488, Bietz (1931a) 
488, Bomanovsky (1926a) 489, Truksa 
(1940) 496, von Bortkiewicz (1922) 497, 
Yasu^wa (1925) 601. See aiso Fisher’s 
Distribution, k samples. 

, estimation of, Bibl., O. L. Davi^ and 

..1 Pearson (1934) 466, P. L. ECsu (193^) 469. 

ratio, Bibl. : S. S. Bose (1936) 448, Cochran 

(1941) 462, Finney (1938, 1941a) 460, 
Morgan (1939) 478, U. 8. Nair (1941a, 6) 
479, Soheff4 (19426) 490. See also Fisher’s 
Distribution. 

, test of, in normal samples, 104 ; difference 

^ of two variances, 116, (Example 26.8) 289. 
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Variate-difference method, 387-94. BihL : Ander- 
Bon (191^ 1923, 1926, 1929) 443, Gave* 
Browne-Cave (1904) 461, Cave and Pearson 
(1914) 451, Haavelmo (1941) 467, K. 
Pearson and Elderton (1923a) 486, Robb 
(1929) 489, “ Student ” (1914) 493, Tintner 
(1936, 1940, 1941) 496, Zaycoff (1936, 1937) 
603. 

Variate transformations, in analysis of variance, 
206-9. See Transformation. 

Variation, coefficient of, BibL : Hendricks and 
Robey (1936) 468, McKay (1931) 477, 
McKay and others (1932) 477. 

Variety trials, Y^tea (1936d, 1937a) 602. 

Vector correlation, alienation coefficients, (Exer- 
cises 28.8, 28.9, 28.10) 361-2. 

representation of a sample, BibL, Bartlett 

(19346) 445. 

von Mises, R., cu*-to8t, 108 ; Irregular Kollektiv, 
123. 

Wald, A., most-selective confidence intervals. 
82-3 ; limiting properties of tests, 322, 
iV.R., 83, 304, 326. 

Walker, Sir Gilbert, time-series, 420 ; significance 
of a periodogram, 434. 

Wallace, N.", JV.iB., 369. 

Wallis, W. A., phases in time-series, 126, 136. 

Water-content in samples, (Example 23.3) 190-4, 
(Example 23.4) 196-8. 

Weather, effect on moths, (Example 22.10) 171-2. 

Welch, B. L., difference of two means, 112, 
(Example 21.6) 113; (Exercise 21.7) 139 ; 
Latin squares, 261 ; footnote 296. iV.i?., 
45, 83, 216, 304, 369. 

Wheat-price index (of Sir William Beveridge), 
(Table 30.1) 396, (Example 30.4, Table 30.6, 
Figure 30.5) 409-10 ; (Table 30.9 and 
Figure 30.9) 426-30 ; (Example 30.6) 
431-2 ; (Example 30.10) 436. 

Wheat prices, and horse population, (Table 30.10) 
436. 


Whittaker, Sir Edmund, periodogram (Exercise 
30.10) 439, Calculua of ObaervcUiona, N.R,, 
394, 437. 

Wioksell, S. D., theorem on regressions, 143 ; 
(Example 22.2) 144 ; (Exercises 22.1, 22.2, 
22.3) 173. N,R., 172, 173. 

Wiener, N., autocorrelation function, 422. 

Wilks, S. S., shortest confidence intervals, 82 ; 
A-tests, 299 ; Hotelling’s T, 337-8 ; dis- 
tribution of means, 341, 368 ; (Exercise 

19.1) 83, (Exercise 19.4) 84, (Exercises 
28.4, 28.6) 360. N,R., 83, 246, 303, 304, 
369. 

Wilsdon, B. H., N.R,, 246. 

Wilson-Hilferty transformation of x\ HO* 
Wishart, J., (Exercise 24.3) 246, (Exercises 28.1, 

28.2) 369-60. N.R., 245, 369. 

Wishart’s distribution, 330-6, 337-8, (Exercise 

28.3) 360. Bibl. : P. L. Hsu (1939a) 469, 
Ingham (1933) 470, Wishart (1928) 600, 
Wishart and Bartlett (1933c) 600. 

Wold, H., ft)*-test. 108 ; (Exercise 26.3) 267 ; 
time-series, 418 ; Carleman criterion, 440. 
NM,, 266, 437. 

Wolfowitz, J., confidence intervals for terminals 
of a distribution, 83. N.Rn 304. 
Woodbury, M., tied ranks, 441. 

Wool thread, weights of, (Example 23.2) 183-6. 


Yates, F., tables of t, 102 ; (Example 23.6) 200-2 ; 
z-distribution, 206 ; (Example 23.8) 214 ; 
(Example 24.1) 221-6 ; (Example 24.6) 
230-3 ; design of experiments, 263. N,R., 
94, 216, 246, 266. 

Yule, G. U., autoregressive series, 418 ; (Exercises 
30.3 and 30.9) 439. NM., 394, 437. 


2^ycoff, R., variate-difference method, 393. N,R., 
394. 

z-distribution, see Fisher’s Distribution. 





